September 1949 


SFIOMETRICS 


Vol.5 No.3 
THE BIOMETRICS SECTION, AMERICAN STATISTICAL ASSOCIATION 


Optimum Allocation and Variance 
Components in Nested Sampling with 
an Application to Chemical Analysis 


Fitting a Straight Line when Both 
Variables are Subject to Error 


Relationship of Catch to Changes in 
Population Size of New England 
Haddock 


One Degree of Freedom for 
Non-Additivity 


On a Statistical Approximation to 
the Infection Interval 


| 
Sophie Marcuse 
is 
M. S. Bartlett 
Howard A. Schuck 
hn 
John W. Tukey 
B. Chassan 


| 
q | 
4 
jos 
a 
4 
q 
> 
“ 


The Biemetrics Section of the 
American Statistical Association 


275 5: 2 222 
TABLE OF CONTENTS 
Optimum Allocation and Variance Components in Nested Sampling 
with an Application to Chemical Analysis SopHize Marcuse 189 
Fitting a Straight Line when Both Variables are Subject to 
Relationships of Catch to Changes in Population Size of New 
England Haddock. ....... =. %HowarpA.Scuuck 213 
One Degree of Freedom for Non-Additivity . . . Joan W. Tukey 232 
On a Statistical Approximation to the Infection 
Queries 250 
The Biometric Society . . 254 
News and Notes 256 


Number 3 September 1949 Volume 5 


EE 
= = 
| 
: 
q 
: 
i 


Material for Biometrics should be addressed to the Chairman of the Editorial 
Board, Institute of Statistics, North Carolina State College, Raleigh, N. C.; and 
material for Queries should go to ‘‘Queries”, Statistical ratory, Iowa State 
College, Ames, Iowa, or to any member of the committee. 


Officers 
American Statistical Association 


President: Simon Kuznets; President-Elect: S. S. Wilks; Vice-Presi- 
dents: L. 8. Kellogg H A. Freeman, D. 8. Brady; Secretary-Treasurer: 
M. M. Flood; Directors: C. H. Goulden, Isador Lubin, L. J. Reed, 
F. F. Stephan, W. L. Thorp, L. L. Thurstone. 


Officers 
Biometrics Section 


Chairman: Margaret Merrell; Secretary: Max A. Woodbury; Section 
Committee: Joseph Berkson, Harold F. Dorn, Alton S. Householder, 
Jerzy Neyman and Joseph Zubin. 


Editorial Board 
Biometrics 
Chairman: Gertrude M. Cox; Members: C. I. Bliss, D. J. Finney, 


H. C. Fryer, O. Kempthorne, A. M. Mood, Horace Norton, G. W. 
Snedecor and Jane Worcester. 


Membership dues in the American Statistical Association are $8.00 a year, of which $4.00 is for a 
year’s subscription to the quarterly Journal, $.75 is for a year’s subscription to The American Statisti- 
cian. Members of the Association may subscribe to Biometrics at a special rate of $2.00. Dues for 
associate members of the Biometrics Section are $3.00 a year, of which $2.00 is for a year’s subscription 
to Biometrics. Single copies of Biometrics are $1.00 each and annual subscriptions are $3.00. Subscrip- 
tions and applications for membership should be sent to the American Statistical Association, 1603 
K Street, N.W., Washington 6, D. C. 


Entered as second-class matter, May 25, 1945, at the office at Washington, 
D.C., under the Act of March 3, 1879. Biometrics is published four times a year— 
in March, June, September and December—by the American Statistical Association 
for its Biometrics Section. Editorial Office: 1603 K Street, N.W., Washington 6, D.C. 


be, 
4 
4 
+25 
| 
7 
j 
. 
4 
» 


OPTIMUM ALLOCATION AND VARIANCE COMPONENTS IN' 
NESTED SAMPLING WITH AN APPLICATION TO 
CHEMICAL ANALYSIS 


Marcuse 


U.S. Naval Research Laboratory, 
Washington, D. C. 


INTRODUCTION 


A SAMPLING TECHNIQUE frequently used in chemical and physical 
analyses for estimating the mean of a population is that of multiple 
random subsampling, called nested sampling by P. C. Mahalanobis.’ 
For instance, when determining the moisture content of cheese, a food 
chemist might wish to select his samples randomly from different lots, 
and again from different cheeses of each lot, and finally make duplicate 
determinations on each cheese. A primary objective in the statistical 
design of such a sampling procedure is to minimize the cost of obtaining 
the sample estimate if the desired degree of precision is fixed, or con- 
versely, to maximize the precision of the estimate obtained from a 
given amount of expenditure including personnel, time, and equipment. 
The question arises as to how the number of sampling units at each 
level should be determined to meet these optimum requirements assum- 
ing equal frequencies in the subclasses. 

It is assumed in this paper that at each classification level, the cost 
is proportional to the number of units sampled at this level, and that 
the cost per sampling unit is known. Thus the total cost is a linear 
function of the numbers of sampling units at the various levels, with co- 
efficients representing the (known) costs per sampling unit at these levels. 
On the other hand, the precision of the mean yielded by the experiment 
can be expressed in terms of the variance of this sample mean; it will 
then also be a linear function of the variances corresponding to each 
level, with coefficients involving the reciprocals of the number of units 
at the various levels. If the variances at the various levels are not 
known, they should be estimated from a preliminary experiment. The 
present paper discusses optimum allocation of the sampling units in 
nested sampling in terms of 3 levels. As an illustration of an experi- 
mental situation, a numerical example is given involving the estimation 
of variance components. In the appendix, the formulas for optimum 
allocation in nested sampling with k levels are derived. 


1For reference see M. Ganguli’s paper on Nested Sampling [7]. 


189 


ig 
| 
Ba 
= 


a: 


190 BIOMETRICS, SEPTEMBER 1949 


For concreteness, we consider the above mentioned specific problem 
of planning in the most economical way an experiment in food chemistry 
designed to determine the moisture content of cheese, the subsampling 
levels involving lots, cheeses, and determinations. Clearly, the princi- 
ples elucidated in terms of this particular problem for 3 levels are 
applicable to a wider class of problems involving more levels in sub- 
sampling, as, for instance, by expanding this simplified experiment to 
more than one factory. Also, they may be applied to other than 
chemical investigations involving nested sampling, for instance: in the 
determination of the breaking strength of a certain type of bronze, a 
metallurgist may wish to choose random samples from different ladles, 
then again from different molds of each ladle, and make duplicate de- 
terminations on the samples from each mold; in a manufacturing process, 
the subsampling categories may be lots, bags, and batches; in a gunnery 
experiment, test shooting may be done by different operators taking 
a number of observations on different runs; in agricultural investiga- 
tions, the entire area under survey may be subdivided into a large 
number of zones, these in turn into a large number of smaller zones, 
and so on; in studies of spray deposit in insect work, plots, trees, and 
apple samples have been used as subsampling levels [2]. Examples of 
nested sampling in biological and industrial work together with analyses 
of variance components may be found in G. W. Snedecor’s [10] and 
L. H. C. Tippett’s [12] books. In designing a sample survey for esti- 
mating the jute crop in India, P. C. Mahalanobis [9] has used the cost 
function for considerations of optimum allocation and discussed their 
general application to large scale sample surveys; principles of optimum 
allocation in nested sampling have been used by M. H. Hansen et al. 
[8] in a sample survey of business involving 2-fold nested sampling 
from finite populations (countries, stores), and by L. H. C. Tippett [12] 
who describes an experiment where in obtaining soil samples from 
counts of cysts, a number of “borings” of soil were taken and then 
several counts made on each boring. 


DEFINITION OF NESTED SAMPLING 


The problem considered is one in which the total population is sub- 
divided into primary sampling units (lots); these in turn are subdivided 
into secondary sampling units (cheeses) on which several measurements 
(determinations) are made representing the tertiary sampling units. 
The nested sample is obtained by selecting at random first n, primary 
(lots), then mn, secondary (cheeses), and finally n, tertiary sampling 
units (determinations) from each of the preceding units, where n, , n2 , 


\ 
“4 
‘ 
= 
j 
ing 


OPTIMUM ALLOCATION IN NESTED SAMPLING 191 


ms represent the class frequencies. A measure of the variance of the 
sample mean in terms of the class frequencies is desired. Before de- 
riving it, the structure of the mathematical model will be explained. 

Let x,;; denote the j-th determination from the z-th cheese of the 
h-th lot. Assuming that the effects of the sampling units at the different 
levels are additive, we may describe an individual observation 2,,;; in 
nested sampling [7] as: 


h = 1, 2, --- , nm, where h refers to the lot of cheese 

a = 1, 2, --- , m_ where 7 refers to the cheese in each lot 

j = 1,2, --- , m3 where j refers to the determination on each cheese. 
The value » represents the general population mean and is thus a fixed 
constant. The components & , m: , a; are random variables with 
means and covariances equal to zero and with variances equal to 
o; , 02 , os , respectively, called variance components. Thus the com- 
ponents & , mx: , fx; represent the effects peculiar to the lots, cheeses, 
and determinations, and the variance components the variabilities at 
the different levels. 


’ 


VARIANCE OF SAMPLE MEAN AND ESTIMATION OF VARIANCE COMPONENTS 
IN NESTED SAMPLING 


From the definition of an individual observation 2,;; in nested 
sampling, given by equation (1), we have for the sample mean 


Then because of the assumptions made for the random variables &, , 
ma» Sx; We Obtain for the variance of the sample mean 
2 2 2 
2 03 
nN, NyN2 


(3) 


This expression gives the variance or precision of the sample mean as a 
linear function of the reciprocals of n, , n,n. , and n,n.n; representing 
the total number of lots, cheeses, and determinations used. The co- 
efficients are the variance components o; , 02 , 03 , being the variances 
encountered at the 3 subsampling levels. 

As long as the parameter values o; , «2 , 3 are unknown, the variance 
function o; in (3) cannot be used for solving the problem to determine 
the optimum values of the class frequencies. On the other hand, if a 
set of class frequencies were given and used in performing an experiment 
in nested sampling, then the unknown parameters o; , o2 , o3 could 


34] 
; 
4 


192 BIOMETRICS, SEPTEMBER 1949 


be estimated from an analysis of variance of the experimental data. 
This dilemma’ may be evaded by first carrying out a preliminary ex- 
periment in nested sampling® using a set of arbitrarily chosen class 


TABLE 1 
ANALYSIS OF VARIANCE IN 3-FOLD NESTED SAMPLING 
Degrees of Mean Expected 

‘Source of Variation freedom Square Mean Square 
Primary sampling units n* — 1 MS, + + 
Secondary sampling units 

within primary units n*(nt — 1) | MS, + 
Tertiary sampling units 

within secondary units| n¥n#(n¥ — 1)| MS; 03 


frequencies. We will show how the data obtained from such a pre- 
liminary experiment give advance estimates of oi , 02, 03 , SAY 8 , 
82 , 8; , to be used for estimating the coefficients of the variance func- 
tion. 

Denote by n*¥, n¥, n¥ the given class frequencies of the preliminary 
experiment in nested sampling. Perform a customary analysis of 
variance on the observed data, as shown in the first 3 columns of table 1, 
where MS, , MS, , and MS, denote the mean squares corresponding to 
the primary, secondary, and tertiary sampling units. It can be shown 
that the expected values of the mean squares MS, , MS, , and MS; are 
the expressions shown in the last column of table 1*. Considering the 
estimates of these expressions by substituting the estimated variance 
components 8; , 82 , 83 , we obtain the equations 


MS, = + + 
MS, = s3 + ns? (4) 
MS; = 83 


28ee M. Friedman’s discussion of a similar situation in planning an experiment ((11], p. 345). 

%Or a mixed model design of experiment (e.g. randomized blocks or split plot) which includes the 
subsampling categories under consideration. Note that such a design might involve more degrees of 
freedom thus increasing the reliability of the estimated variance components ((3], [4]). 


aR 


Its for any ber of sub-samplings and unequal frequencies are given by M. Ganguli [7]. 


tf 
; 
4 
at 
i 
1 


OPTIMUM ALLOCATION IN NESTED SAMPLING 193 


Whence we have the solutions 


8 = MS; 
2 MS, MS; 
nt (5) 


2 MS, MS, 


ning 


in which the estimated variance components are expressed in terms of 
the mean squares calculated in the analysis of variance table of the 
experimental data from nested sampling.” These equations can be 
extended from three to k subsamplings by the same reasoning. 


OPTIMUM ALLOCATION IN 3-FOLD NESTED SAMPLING 


The variance of the sample mean and the total cost expenditure for 
determining it, expressed in terms of the class frequencies, are the two 
functions needed for solving the optimum allocation problem under 
consideration. Considering the case of 3 levels, let C(n, , m2 , ns) be the 
cost function and V(n, , nm , m3) the variance function, the variables 
nN, , N2 , Ns representing the class frequencies. As given by equation 
(6), the cost function C(n, , n2 , 23) is assumed to be an additive function 
of the costs at the three levels, that is the costs of n, primary, n,n. 
secondary, and n,n nz tertiary sampling units altogether, the cost per 
primary, secondary, and tertiary sampling unit being c, , c, , and c; 
respectively. The variance function V(n, , m2 , Ns) is given by equation 
(3) showing the variance of the sample mean, o}3, in 3-fold nested 
sampling; its parameters may be estimated from the data of a pre- 
liminary experiment by the analysis of variance procedure for esti- 
mating variance components as described above. Thus we have: 


C(n, , Ne » Nz) = + + CaN (6) 


2 2 > 
02 03 
Vin , 2, = = + + (3) 


The problem of optimum allocation is to minimize C(n, , n2 , v3) by 
proper choice of n, , nz , ns subject to the constraint that the allowable 


‘This analysis of the variance components was performed on data from neste) sampling, which 
is a special case of Model II analysis of variance as shown below. If a similar analysis of variance 
components is routinely carried out on data belonging to Model I, the interpretation differs. In 
Mode! IJ, the computed variance components estimate the variances g1? , 72? , 73? associated with ran- 
dom factors, whereas in Model I, these are dummy symbols representing sums of squares of differences 
related to the variation of systematic (or fixed) factors ({1], [5]). 


% 
= 
: 
Pte 
| 


194 BIOMETRICS, SEPTEMBER 1949 


amount of variance is preassigned, say v, or to minimize V(n, , nz , Ns) 
by proper choice of n, , m2 , ns subject to the constraint that the total 
amount of cost is fixed, say c. Let nc; , Nc2 , Nc3 aNd Ny; , Ny2 , Nys be 
the optimum solutions of the two problems respectively. By applying 
Lagrange multipliers it can be shown® that these optimum values of 
, Ns are 
3 


Ver 
Neo = (7) 


Ne: = — 


ra = 


nys = 


The sets of equations (7) and (8) show similar features. Except 
for the first level, the optimum combination of the number of sampling 
units is independent of the given degree of precision or the fixed total 
cost, being the same whether the precision or the amount of cost is 
assigned beforehand. Therefore, when planning an experiment in 
nested sampling the analyst need be concerned with the given cost or 
precision only in selecting the number of primary sampling units. 
Clearly, an increase in funds would be utilized most efficiently, that 
is resulting in the highest possible precision, by a proportional increase 
in the number of primary sampling units, and similarly, the most 
economical way for attaining a higher degree of precision would consist 
in choosing a correspondingly greater number of primary sampling 
units. 

In many instances, the research analyst might not wish to depend 


“See appendix for development of these formulas. 


: 
an 
2 
G2 VCs 
ve.) 
le 
{ 


OPTIMUM ALLOCATION IN NESTED SAMPLING 


195 


on considerations of optimum allocation in the choice of the frequencies 
at all levels, but might prefer to take, for instance, duplicate or triplicate 
determinations from each cheese for check purposes, thus preassigning 
the class frequency associated to the tertiary sampling unit, n;. If 
ns; is prefixed, the corresponding optimum allocation formulas’ are 


+ + on] 


Vey 


(9) 
2, 93 
Ns Cy 
in the case that the variance » is given; and 
2 
E Ve + + ean) Ve 
3 
(10) 


2 
Nye = \ 
Co + Cans 


in the case that the total cost c is given. 


NUMERICAL EXAMPLE 


The figures shown in table 2 are results from analyses of samples 
of cheese for the determination of moisture content.* They will serve 
as the preliminary data for obtaining estimates of the variance com- 
ponents. The experimental set-up in nested sampling involves duplicate 
determinations made on 2 cheeses from each of 3 lots, the different 
cheeses and the different lots being randomly selected (n¥ = 3 , n¥ = 
= 2). 

The first 4 columns of table 3 show the results of an analysis of 
variance of these data. In nested sampling the sums of squares may 
be calculated as follows: Consider first table 2 (in which there are 3 
factors: duplicates, cheeses, and lots) and refer to the figures, repre- 
senting 1 determination, as “totals.” Subsequently, obtain the totals 


7See appendix for development of formulas in which all but the first k’ are fixed. 

8The data are drawn from ‘‘Report on Sampling Fat and Moisture in Cheese” by William Horwitz 
and Lile F. Knudsen, J. Ass. Off. Agr. Chem., vol. 31 (1948), pp. 300-306; slight modifications have 
been made for illustrative purposes. The author acknowledges the suggestions of Lila F. Knudsen; 


i 
bcd 
|| 
te 
; 
| 
| 


: 196 BIOMETRICS, SEPTEMBER 1949 


TABLE 2 
H MOISTURE CONTENT OF 2 CHEESES FROM EACH OF 3 DIFFERENT LOTS, 
DETERMINED 2 TIMES 


Lot 
Cheese 
I ll Ill 
1 39.02 35.74 37.02 
38.79 35.41 36.00 
2 38.96 35.58 35.70 
39.01 35.52 36.04 


‘for the duplicates on each cheese (there remain 2 factors: cheeses and 
lots), and also the totals of the 4 determinations on each lot (there 
remains 1 factor: lots), in addition to the total for the entire table (no 


TABLE 3 


ANALYSIS OF VARIANCE OF DATA ON MOISTURE CONTENT OF CHEESE 
GIVEN IN TABLE 2 


a} Source of Degrees Sum of Mean Expected Estimated 
ay Variation of Squares Square Mean Square Variance 
Freedom Components 
u Lots 2 SS: = 25.9001 | MS; = 12.9501 | oa? + 202? + 40:7] #1? = 3.2028 
Cheeses 
within lots 3 SS: = .4166) MS: = = .1389 | oa? + 202? a? = .0143 

Determinations 

4 within cheeses 6 SS; = .6620 | MS; = -1103 | oa? s3? = .1103 


. factor remains). Denote by Q; , Q , Q, , and Q, the sum of squares of 
i these corresponding totals divided by the number of determinations 


making up each total: 

: Q; = 39.02? + 38.79? + --- + 35.70” + 36.047 = 16,365.5607 
77.817 + 77.97 + 71.15" + 71.10? + 73.02? + 71.74’ 
2 

= 16,364.8988 

+ 2 2 2 

g, 155:78° + + 144.76" _ 16 364.4821 


3 
| 


OPTIMUM ALLOCATION IN NESTED SAMPLING 


197 


Then the sums of squares in analysis of variance, SS, , SS, , SS; , are 
the successive differences of these expressions: 


SS, = Q, = Q 25.9001 
SS, = Q. = 0.4166 
SS; = Q; Q, = 0.6620° 


The sums of squares and the corresponding mean squares are shown 
in columns 3 and 4 of table 3. The estimated variance components 
8 , 82, 83, Shown in the last column of table 3, follow from equations 
(5). These values represent the advance estimates from the pre- 
liminary data to be used in the planning of the experiment. 

The problem of designing an experiment with optimum allocation 
may arise in chemical laboratory work, e.g., when it is desired to set 
up in the most economical way routine analyses of samples of cheese 
for the determination of moisture content. In the example under 
consideration we assume that the chemist wants to spend not more 
than 60 dollars altogether to be allocated in such a way that the uuighest 
precision results; that he requires duplicate detern_nations fo. check 
purposes; and that the cost factors per .ot, cheese, ana de-te.mination 
are 10, 3, and 1 dollar respectively. Since these equirerents prefix 


the class frequency n; and the total cost UC, formas (!'!) are appro- 
priate. Substituting nx = 2,c = 60,c, = 10,c > 3, anu cs = 1, and 
for the variances o; , o2, o their estimates s} = 4.2028, 23 = 0.0143, 


83 = 0.1103, we obtain: 
ny, = 5.43 Nye = 0.2 


The corresponding integer values have to be ch: en in accordance with 
the conditions of the experiment. Since i. , the number of cheeses 
selected from each lot, must be at least one, the number of lots, n , 
may be reduced. A» examination of the integers smaller than nj, 
shows that x, = 4 together with i. = 1 fulfill the required conditions. 
Thus 4 lois and 1 cheese give the optimum solution for the problem 
under consideration. 

The merit of this‘optimum combination may be judged by com- 
paring it to other combinations of class frequencies. In table 4 a 
number of various combinations (columns 1 and 2) are presented 
together with the precision of the sample mean (columns 5 and 6) and 


*Using the figures given for Q: , Qs above, we have Q; — Q: = .6619 instead of .6620. Such a dif- 
ference in the last decimal place is due to rounding off results, intermediate computations being carried 
out to more decimal places. 


at 
est 
= 


198 


CULATED FROM PRELIMINARY DATA (TABLES 2 AND 3). 


BIOMETRICS, SEPTEMBER 1949 


TABLE 4 
ESTIMATED PRECISION AND COST OF DETERMINING MOISTURE CONTENT OF 
CHEESE WHEN A SPECIFIED NUMBER OF LOTS (m) AND A SPECIFIED NUMBER OF 
CHEESES FROM EACH LOT (nm) ARE USED AND TWO DETERMINATIONS (nz = 2) 
ARE MADE ON EACH CHEESE. CONSTANTS USED ARE ADVANCE ESTIMATES CAL- 


Formulas used: 


Constants used: 


N = 


N3 

C + cnn, + 
2 2 2 
83 83 


ny, NyN2 


= 2 


= 10,c. = 3,c, = 1 


8; = 3.2028, s; 


= .0143, s; = .1103 


ev = % x 10 Z = 36.90 

Number of— Expenditure Estimated Precision 

Lots Cheeses Number of | Total Cost Variance Coefficient 

Determina- |_ in dollars of mean of 
tions Variation 

Ne N V CV 
(1) (2) | (3) (4) (5) (6) 
5 3 | 30 125 0.6452 2.18 
5 2 20 100 0.6475 2.18 
5 1 10 75 0.6544 2.19 
4 3 | 24 100 0.8065 2.43 
4 2 16 80 0.8094 2.44 
4 1 8 60 0.8181 2.45 
3 3 18 75 1.0753 2.81 
3 2 H 12 60 1.0792 2.82 
3 1 6 45 1.0907 2.83 
2 3 12 50 1.6130 3.44 
2 2 || 8 40 1.6188 3.45 
2 1 | 4 30 1.6361 3.47 
1 3 \ 6 25 3.2259 4.87 
1 2 | 4 20 3.2375 4.88 
1 1 2 15 3.2722 4.90 


the expenditure involved in determining it (columns 3 and 4). Column 
3 shows the total number of determinations made, the total cost is 
given in column 4, and column 6 compares the relative precision of 


= 
¢ 
5 
3 
} 
4 
4 


OPTIMUM ALLOCATION IN NESTED SAMPLING 199 


the sample mean, indicated by its coefficient of variation, to the absolute 
precision in terms of the variance (column 5). Duplicate determina- 
tions are used throughout. It can be seen that the 4-1-2 combination 
is more economical than the 3-2-2 combination—the one used in the 
preliminary experiment—since it obtains a higher precision but re- 
quires the same cost (60 dollars). Also, the combination 3-2-2 is less 
efficient than the combination 3-1-2 since, for the same precision, the 
latter combination needs half the number of determinations and re- 
quires only 45 dollars instead of 60 dollars. In general, it pays to in- 
crease the number of lots instead of the number of cheeses since the 
former are more variable. 


REMARKS ON NESTED SAMPLING AS A SPECIAL CASE OF 
MODEL II ANALYSIS OF VARIANCE 

The mathematical model of nested sampling as given by the funda- 
mental equation (1) and its assumptions, is closely related to one 
specific mathematical model used in analysis of variance. Two models 
of analysis of variance, usually referred to as Model I and Model II, 
have been discussed recently by S. L. Crump [3] and C. Eisenhart [5]. 
It seems worthwhile to show that, in virtue of the underlying assump- 
tions, nested sampling represents a special case of Model II of analysis 
of variance. 

The two different models of analysis of variance involve the analysis 
of two different types of factors: systematic factors in Model I and 
random factors in Model II. A factor such as ‘‘treatment”’ or “lot” 
is a random or a systematic factor depending on the way its variants 
are chosen. Here the term “‘variant”’ of a factor is used based on Fisher’s 
terminology [6], for instance, the variants of the factor “‘treatment”’ 
may be e.g. “nitrogen” and “phosphate” and different lots the variants 
of the factor ‘“‘lot.’”” When an experimenter selects the two treatments 
“nitrogen” and “phosphate,” he selects them systematically from a 
population of possible treatments on the basis of subject matter judg- 
ment; on the other hand, when selecting different lots of material for 
studying the effects of the treatments, he generally bases his choice on 
random selection ({5], [10] Chapter 8). Since systematically chosen 
variants produce systematic variation and randomly chosen variants 
random variation, the type of factor may be determined according to 
the issue: systematic or random variation. Usually, “methods” and 
“treatments” represent systematic factors, ‘blocks’ and “‘ots’”’ random 
factors, whereas factors such as “days” or “animals” or “locations” 
may represent either systematic or random factors; both types of factor 
will often occur in the same experiment; then the model is a mixed one. 


| 
| 
Bie 
Wee 
He 


200) BIOMETRICS, SEPTEMBER 1949 


Now the factors encountered in nested sampling are the primary, 
secondary, tertiary sampling units (lots, cheeses, determinations). 
Under the assumptions made, the variants of these factors, i.e. the 
units selected at each level, were chosen randomly. These factors, 
therefore, are random factors and thus nested sampling belongs to 
Model II. 

In order to describe more accurately the relationship of nested — 
sampling to Model II of analysis of variance, we subdivide the random 
factors of Model II into two categories: cross classified’® with respect to 
another factor or not. For instance, in the 2 factor “day-animal”’ 
experiment discussed by C. Eisenhart [5] as an example of Model II, 
the random factor ‘‘animal”’ is cross classified with respect to the factor 
“days,” each of the randomly chosen animals being tested on all days 
(the analysis of .variance table contains: ‘‘Between days’’, “Between 
animals,” and “Residual” with d — 1, anda — 1, and (a — 1)(d — 1) 
degrees of freedom respectively). On the other hand, there would be 
no cross classification, if on each day a number of animals were randomly 
chosen for testing, as for instance in an inoculation experiment affecting 
the sensitivity of the animal (the analysis of variance contains: ‘‘Be- 
tween days,” and “Between animals within days” with d — 1, and 
d(a — 1) degrees of freedom respectively). Likewise, no cross classi- 
fication would be involved for the random factor “animal” if each 
animal would be tested on a couple of days which were randomly 
selected, as e.g. if only one animal could be tested per day (the analysis 
of variance contains: ‘Between animals,” and ‘‘Between days within 
animals” with (a — 1), and a(d — 1) degrees of freedom respectively). 
Nested sampling represents the second category of Model II in which 
the random factors involved are not cross elassified since for each 
primary sampling unit a number of secondary sampling units is se- 
lected randomly, and so on. The question as to which order of sub- 
sampling should be adopted in the nested sampling procedure, as, for 
instance, whether to use ‘animals’ as primary sampling units and 
“days” as secondary sampling units, or conversely, is a decision to be 
made on the basis of subject matter judgment. 


APPENDIX 


We shall now derive the optimum values of the class frequencies, 
given for the three-fold level by formulas (7), (8), (9), and (10), for 
the general case of k-fold nested sampling. Instead of solving the prob- 


10This term is not synonymous with ‘‘ordered’’. Note that items in table 2 below are ordered for 
purely designative reasons there being neither a cross classification nor an element of ‘‘sequence”’ 
involved. : 


F 

| 

ad 


OPTIMUM ALLOCATION IN NESTED SAMPLING 


201 


lem directly by introducing the Lagrange multiplier, we will apply 
this procedure to a pair of generalized functions. We then obtain as 
special cases the solution formulas for optimum allocation in 
i. k-fold nested sampling 
ii. k-fold nested sampling in which some class frequencies are fixed 
beforehand 
iii. stratified sampling from finite population (k strata, 2 levels). 


a. Minimum Problem for 2 Generalized Functions 
Let the two generalized functions be 


k 
Ni) = > + a (11) 


FAN, , Mi) = Dy te (12) 


where N, , --- , N, denote variables and a, , a , and a; , a; (¢ = 1, 
-++ , k) are constants. 

Consider first the problem to minimize F,(N, , --- , Ny) subject to 
the side condition 


FAN, , Ni) = b2 (13) 


where b, is a constant. Using the Lagrange multiplier \ in the usual 
way, we let the derivatives of F, + AF; with respect to N; (¢ = 1, 
-++ , k) be zero, and obtain 


(a2; /N%) =0 


N,= Vx V 
Substituting these values of N; in (13), where F, is given by (12), we have 


or 


FAN, , +++ , Nx) = (1/2) V 01:02; + = be 


Therefore 


| 
V4; :02; 
ty 
Hence we obtain the optimum values Boge 
k 
N a2; 14 45 


202 BIOMETRICS, SEPTEMBER 1949 


Similarly, we obtain the solution of the problem to minimize F,(N, , 
-++ , N,) subject to the side condition 


=b (15) 
where b, is a constant: 
(16) 
l¢é 
V4; 


Now introduce the variables 
=N,, n, = = 2,--- ,k) (17) 


then N; = n, --- n(i = 1, --- , k). Substituting the new variables in 
(11) and (12), we obtain the functions 


m) = + a, (19) 


Substituting (14) in (17), we find that the minimum solutions of f,(n, , 
, n,) under the side condition f,(n, , --- , = are: 


k 


V 44:02, 
— 


and (20) 
=2,---,k 


Q, 


Similarly, substituting (16) in (17), we find the minimum solutions of 
fo(m; , , m,) under the side condition f,(n, , --- ,m) = 


b, — a, 
V 44:02; 
and (21) 


i-1 
‘ => = 2, eee k 


Note that n,; = m(i = 2, , k). 


1 
a 
> 
3 
k 


OPTIMUM ALLOCATION IN NESTED SAMPLING 


b. Application to Optimum Allocation Problems in Sampling 
i. Nested Sampling 


Substituting a,; = ¢; , @2; = o; and a, = a, = O in (18) and (19), we 
obtain the 2 functions 


M 
is) 


gi(my Me) (22) 


go(ny Nx) 2. (23) 


These functions represent the general case of the cost function C(n, , nz, 
n,) and the variance function V(n, , nz , ns) used above in section 4. 
Setting b, = c and b, = v yields the corresponding side conditions. 
Therefore applying formulas (20) and (21), we have as the minimum 


solutions of g,(m, , +++ , m,) under the side condition g.(n, , , = v 


11 
v 
and (24) 
Nii (2 2, 


and as the minimum solutions of g.(n, , --+ , 2) under the side condition 


No = 
(6; Ve.) vei 
and (25) 
No; = (i = 2,--- ,k) 


Specializing equations (24) and (25) to the case k = 3 yields equa- 
tions (7) and (8). Specializing equation (25) to the case k = 2 and 
letting cost be expressed in terms of time, c, = kt, c. = t, gives equation 
10.32 in L. H. C. Tippett’s book [12]. 


ii. Nested Sampling with Some Prefixed Class Frequencies 


Let nj , «++ , nj be the unknown frequencies and n,-,,, --- , m be 


| 
k 
k 2 
6;VC; = 
(one 
; 
% 4 
; 
| 
™ 
2 


204 BIOMETRICS, SEPTEMBER 1949 


fixed beforehand. The equations (22) and (23) may then be rewritten 
in terms of nj, --- , as follows: 


int 
(26) 
= > cin! 
i=l 
and 
(27) 
= % + Mgr ny, 
t=) 
(28) 
n, n, 
where o —1) 
and 
kok’ 2 (29) 


OK 
Thus the functions h, and h, of the variables nj, --- , nj. , given by (26) 
and (28), represent the same types of function as the functions g, and g, of, 
the variables n,, --- , nm, given by (22) and (23). Therefore the mini- 
mum solutions of , and --+ , under the side 
conditions h,(n{ , , ni-) = vandh,(n{, , = c respectively, 
may be obtained from equations (24) and (25) by replacing k by k’ , 
a by o’, and c by c’, and then substituting back o/ and c} (j = 1, --- , k’) 
from equations (27) and (29). 

For k = 3, k’ = 2 we obtain from (27) and (29) 


C2 = Co + 

2 2 3 

Nz 


The substitution of these values into (24) and (25) after replacement of 
k, c, o by k’, c’, o’ gives the formulas (9) and (10) used above. 

Note that the results of b. ii. may also be obtained from a. and then 
b. i. be considered as the special case k’ = k. 


2 
| 


OPTIMUM ALLOCATION IN NESTED SAMPLING 


iii. Stratified Sampling from Finite Populations 


We will indicate briefly the applicability of the above used general- 
ized functions to stratified sampling involving two levels. 

Let there be & strata in the population with M, elements z,,; in the 
i-th stratum (¢ = 1, --- ,k; 7 = 1, --- , M,). Assume that the N; 
sample elements 2;; (¢ = 1, --- ,k;j7 = 1, --- , Ni) are independently 
drawn at random from the k finite strata. Then the sample mean 
Ni 


M, N, 


has the variance 


where M = 5°'., M; and a; denotes the variance between elements in 
the i-th stratum. Thus we have 


k 
2 
Ly 


t=1 


where a 


Let c; be the cost per element in the 7-th stratum and c = bam c,N; 
the total cost, then c may be writtenc = a,,N; + a, where a,; = ¢; 
anda, = 0. Thusc and o; correspond to the functions F,(N, , , Nx) 
and F,(N,, --- , Nx) respectively in (11) and (12). Therefore equations 
(14) and (16) give the desired minimum solutions where 6, and b, de- 
termine the side conditions corresponding to (13) and (15). In case the 
populations in the strata are large (M; ~ M; — 1), we obtain the well 
known optimum allocation formulas: 


(Mio. Ves) Mec. 
M’b, + > (M,o') Ve 


t=1 


Myo; Ve: Ve 


i=1 


= 


| 
1 M; —N; 
in. M 1 
. 
% 
k 
Bee 
peste 
USS AD 
eel 


206 BIOMETRICS, SEPTEMBER 1949 


{ LITERATURE CITED 


{1] Anderson, R. L. Use of Variance Components in the Analysis of Hog Prices 
‘ in Two Markets, J. Am. Stat. Ass., 42: 612-634, 1947. 
' [2] Cassil, C. C., Wadley, F. M., and Dean, F. P. Sampling Studies on Orchard 
Spray Residues in the Pacific Northwest, J. of Econ. Entom., 36: 227-231, 
1943. 
2 [3] Crump, S. Lee. The Estimation of Variance Components in Analysis of Vari- 
: ance, Biometrics Bulletin, 2: 7-11, 1946. 
: [4] Daniels, H. E. The Estimation of Components of Variance, Supplement to the 
Journal of the Royal Statistical Society, 6: 186-197, 1939. 
; [5] Eisenhart, Churchill. The Assumptions Underlying the Analysis of Variance, 
rl Biometrics Bulletin, 3: 1-21, 1947. 
- (6] Fisher, R. A. The Design of Experiments, 3rd Edition. Oliver and Boyd, Ltd., 
Edinburgh and London, 1942. 
[7] Ganguli, M. A Note on Nested Sampling, Sankhya, 5: 449-452, 1941. 
(8] Hansen, M. H., Hurwitz, W. N., and Gurney, M. Problems and Methods in a 
Sample Survey of Business, J. Am. Stat. Ass., 41: 173-189, 1946. 
[9] Mahalanobis, P.C. On Large Scale Sample Surveys, Philos, Transactions of the 
Royal Society, Series B, Biolog. Sciences, 23: 329-451, 1944. 
[10] Snedecor, G. W. Statistical Methods Applied to Experiments in Agriculture and 
Biology, 4th Edition. The Collegiate Press, Inc., Ames, Iowa, 1946. 
r [11] Statistical Research Group, Columbia University. Selected Techniques of 


: Statistical Analysis: For Scientific and Industrial Research and Production and 
a * Management Engineering. McGraw-Hill Book Company, Inc., New York, 
1947. 

Bs -[12] Tippett, L.H.C. The Methods in Statistics, 3rd Edition. Williams and Norgate, 


Ltd., London, 1940. 


4 
» 
4 
4 


FITTING A STRAIGHT LINE WHEN BOTH VARIABLES 
ARE SUBJECT TO ERROR 


M. S. BartLetr 
University of Manchester, England 


INTRODUCTION 


F ysenpe METHOD Of fitting a straight line when both variables are sub- 
ject to error was examined by Wald (1) in 1940. The purpose of the 
present note is to present and illustrate a modification of Wald’s method 
having the advantage in general of greater accuracy. Before any de- 
tailed exposition it will be as well to recall two important points: 


(i) a distinction must be made between the linear regression equation of 
a variable y on a second variable z, and a linear functional relation 
between two variables Y and X masked by errors. The former 
equation is still available for prediction even if the variable zx is sub- 
ject to error, but is not necessarily appropriate for a functional rela- 
tion when one exists. 

(ii) it is possible to set up maximum likelihood equations for the second 
problem, but they do not lead to a unique solution without further 
assumptions, such as an assumption about the relative magnitude of 
the errors in x and y. 


These points have been emphasized by many previous writers, for 
example, by Wald (1) or more recently by Lindley (2). In view of (ii) 
it is useful to consider, in the common case when the observations have 
equal weight, the following elementary method: 


(a) For the location of the fitted straight line use as one point the mean 
coordinates Z, ¥, just as in the least-squares method. 

(b) For the slope, first divide the n plotted points into three groups, the 
equal numbers fk in the two extreme groups being chosen to be as 
near 3n as possible (the three groups are non-overlapping when 
considered, say, in the x direction). The join of the mean coordinates 
Z, , y, and Z; , ¥; for the two extreme groups is used to determine the 
slope. 


The only difference from Wald’s original method is the use of three 
groups instead of two, for reasons which will be apparent from the results 


207 


+ 
: 
5 


208 BIOMETRICS, SEPTEMBER 1949 


of the next section.’ It will also be shown that Wald’s confidence interval 
method of assessing the accuracy (under suitable conditions) may be 
adapted to the present method. 


EFFICIENCY IN A SPECIAL CASE 


To get some idea of the efficiency of the method its accuracy is 
determined in a special case where least-squares is appropriate. It is 
assumed that observations y are available for n = 21 + 1 valuesz = X 
not subject to error and spaced at equidistant unit intervals. The least- 
squares estimate is known to provide the linear combination of the y’s 


providing an unbiased estimate of the true slope 8 in the functional 
relation 


(1) Y=a+ px 


with minimum variance when the differences y — Y are uncorrelated and 
of constant variance o°. The least-squares estimate 


b= 


has error variance >, (x — 2%)’, where (x — Z)* = + 1) 
(21 + 1) in the situation assumed in this section. 
For comparison the error variance of the estimate 


Ys 


(2) b’ = 


of the last section is easily evaluated for any value of k. It is given by 
20” 20” 


The relative efficiency of b’ is thus 


_ 3k(2l — k + 1)? 
3) E= i+ DQl4+1)° 


This is a maximum when 
(21 — k + 1)(21 + 1 — 3k) = 0 
with relevant root k = (4)(21 + 1) = 4n. 


1] am indebted to Prof Gerhard Tintner for drawing my attention to a previous discussion of 


this problem, with a similar conclusion, by Nair and Shrivastava (4) (see also Nair and Banerjee (5)). 
It might be noted that these authors propose using the two extreme groups out of three for location as 
well as slope, but recommendation (a) above is theoretically preferable. In the first of these two papers 
the extension of the method to fitting higher-order curves is also considered, though the optimum 
efficiency is not so high in such cases. 


ays 
4 
+ 
wed 


FITTING A STRAIGHT LINE 209 


We then have 
8 

9’ 
which may be compared with E = (3)(1 + 4)’/{l(l + 1)] > 3/4 when 
k = 3n. The higher efficiency of k = jn compared with k = 4n suggests 
the adoption of k = 4n in preference to k = 4n in general. Indeed its 
high efficiency in the case examined above indicates the occasional value 
of the simple method proposed even in cases where the least-squares 
method is available. 


(4) E 


ASSESSMENT OF ACCURACY IN THE GENERAL CASE 


In the general problem it is assumed that both y and z are subject to 
error. To use Wald’s confidence interval method it is assumed further 
that the n errors 7 = y — Y are independently and normally distributed 
with constant variance o, , similarly the n errors e = x —.X are inde- 
pendent and normal with variance o;, ; the x and y errors are moreover 
mutually independent, so that the variance of 7 — Beis o, + Ba . 

Consider now possible ‘estimates’ of this last variance when 8 is 
known. If we write for the total sums of squares and products of x and 
y within the three groups 


S,, = + (x — — Ye) 
+ (x — Z3)(y — Ys) 
= + Lay + Lay Hy, 


where >>, denotes summation over the observations in the i-th group, 
then (S,, — 28S., + 6°S,.)/(n — 3) is an estimate of the variance 
o, + Bo. with n — 3 degrees of freedom. The remaining 3 degrees of 
freedom are contained in the three group means. One is represented by 
the general mean, one by the difference between the means of the first 
and third groups to be used in the estimate of 8; the third is represented 
by ‘he difference between the mean of the second group and the general 
mean of the first and third groups. 

For data with few observations it is advisable to make use of the last 
degree of freedom in the variance estimate, as in the numerical example 
considered later. Alternatively, if it is not so used, it remains available 
for testing the linearity of the true X, Y relation. In the former case, 


Be 
| 
| 
45 
| 
| 
os 
Be 
| 
= 
is 
= 
| 
1. 
3 


210 BIOMETRICS, SEPTEMBER 1949 


the appropriate square to be added to the numerator of the previous 
estimate is 


HOR + Ys = 272)” 28(Y; + Ys +i 27.) 


2 
and the estimate s’(8) obtained with n — 2 now as the divisor will have 
n — 2 degrees of freedom. 


Since 
— %,)(b’ — = (73 — Bes) — — Be), 

when b’ is given by (2), the left-hand quantity under the assumptions 
made in this section is normal with variance (0; + 6’o%)(2/k). This is 
subject to one qualification, that the errors in the z variable do not influ- 
ence the allocation of the observations to the three groups. Such an 
effect may be neglected in many problems, particularly when the errors 
are small compared with the spacing of the observations at the points 
of division between the three groups; it will not be considered further 
here. A more detailed consideration of this point has been given by 
Wald (1). 

Under the same assumptions we have 


— — B)V/tk 

3(8) 
Although the denominator depends on 8, this ¢-variate enables a confi- 
dence interval to be obtained for 8. Thus for a value ¢ corresponding to 


any chosen probability value we have the interval determined by the 
quadratic equation for 8, 


(5) (z; — %,)°(b’ — = — 26s,, + 6's:), 
where s°(8) = s; — 26s,, + B's: . 

If required, a similar method may be used to provide a joint confi- 
dence region fora and sg. If a = y — £2, then a is independent of the 
numerator of ¢ and of s(8), and hence 

3 {n(a = a)” + $k(z;, — 
s*(8) 


t= 


is a variance ratio with degrees of freedom 2, n — 2. For any chosen 
probability value the corresponding critical value of F will determine an 
ellipse as the boundary of the confidence region for a and 8. This may 
be compared with the corresponding region for the least-squares method 
if it is known that of = 0; this region is similarly obiained from the 
variance ratio 


. 
« 
7 


FITTING A STRAIGHT LINE 211 


A{n(a — a)? + (b - BY (x — 
where s’ is the usual variance estimate of y — Y obtained from the residu- 
als of y with n — 2 degrees of freedom. 
If, as suggested earlier in this section, it is desired to examine the 
linearity of the functional relation, the variance estimate s:_;(8) of 


o, + Bo. with n — 3 degrees of freedom must be used. The further 
quantity 


+ Ys — — + — 2%2)} \k 
S8n-3(B) 


is then (if the linear relation is valid) also a t-variate with n — 3 degrees 
of freedom. It will be noticed that it involves the unknown slope 8. 
When this is replaced by the estimate b’, the resulting statistic is no 
longer exactly a é-variate, but might be treated approximately as such, 
especially when Z, + Z; — 2%, is small compared with 7; — 7, . 


- 


NUMERICAL EXAMPLE 


As a numerical example consider fitting a straight line to the data on 
penicillin ‘assay’ given by Davies (3, S 6.12). Six different concentra- 
tions of pure penicillin were set up on a plate on which an agar medium 
containing B. subtilis had been spread, and the mean circle diameters of 
the zones of inhibition of growth of the organisms were measured (for 
further details of the technique see S 5.41 of (3)). The concentration 
had negligible error, so that the standard least-squares method was 
available, the relation between circle diameter and log. concentration 
being linear. With circle diameter y in mms. and 1 penicillin unit per 
ml. as x = 1, and a two-fold increase in concentration as the unit for the 
x scale, the regression equation of y on « was 


(6) Y = 20.403 + 1.782(2 — 3.5) = 14.166 + 1.782 2 


with a 95% confidence interval for the slope, based on the usual ¢-sta- 
tistic, of (1.732, 1.832). 

It is stressed that the data are considered again here purely in order 
to illustrate the present method. The six observations are divided into 
three groups: 


y 15.87 17.78, 19.52 21.35, 23.13 24.77 (Total 122.42) 
1 6 (Total 21) 


b’ = (24.77 + 23.13) — (17.78 + 15.87) 


x 
ee 
i. 
ae 


212 BIOMETRICS, SEPTEMBER 1949 


Hence the estimated relation is 
(7) Y = 20.403 + 1.781(X — 3.5) = 14.170 + 1.781X. 


The sum of squares within each group has only one degree of freedom in 
this example, and may conveniently be calculated from the difference 
of the two observations per group. The other degree of freedom to be 
added is that for the contrast of the mean for the second group with 
the mean for the other two groups. This gives zero contribution for z, 
and for y 


24.77 + 23.13 + 15.87 + 17.78 — 2(19.52 + 21.35) = —0.19 


with appropriate divisor. Hence 


ye = + (1.83)? + (1.64)" (—0.19)" 4 
12 
LX 191 + 1 x 1.83 + 1X 1.64 | =0.19) _ 9 69 
12 
+75 = 15. 


Equation (5), with ¢ = 2.78 for 4 degrees of freedom (P = 0.05), gives 
16(1.781 — 8)? = (2.78)°(4.8463 — 28 X 2.69 + 

or 13.10186” — 28(23.2987) + 41.3879 = 0 

or 6 = 1.778 + 0.058. 


Thus the 95% confidence interval for 8 by this method is (1.720, 1.836), 
an interval naturally slightly wider than the interval obtained by the 
least-squares method, since the assumption of no error in xz has been 


dropped. 


REFERENCES 


(1) Wald, A. The Fitting of Straight Lines if Both Variables are Subject to Error. 
Ann. Math. Stat. 11, 284, 1940. 

(2) Lindley, D. V. Regression Lines and the Linear Functional Relationship. J. 
Roy. Stat. Soc. (Suppl.) 9, 218, 1947. 

(3) Davies, O. L. (Editor). Statistical Methods in Research and Production. Oliver 
and Boyd, 1947. 

(4) Nair, K. R. and Shrivastava, M. P. On a Simple Method of Curve Fitting. 
Sankhya 6, 121, 1942. 

(5) Nair, K. R. and Banerjee, K. S. A Note on Fitting of Straight Lines if Both 
Variables are Subject to Error. Sankhyd 6, 331, 1942. 


x 
| 
48 
ij 
13 


RELATIONSHIP OF CATCH TO CHANGES IN POPULATION 
SIZE OF NEW ENGLAND HADDOCK 


By Howarp A. Scutvk 
Aquatic Biologist 


Fish and Wildlife Service 
United States Department of the Interior 


INTRODUCTION 


UniTep States catch of haddock has fluctuated considerably 
throughout the years and these fluctuations have generally been of 
a declining nature. In 1929, the catch was abcut 260 million pounds, 
and in recent years it has averaged only about 150 million pounds. Fluc- 
tuations in the catch have been due in large part to variations in actual 
abundance, or the size of the stock of commercial sizes of haddock on the 
banks in different years. We are, therefore, interested in obtaining an 
accurate measure of the size and the composition of the stock, to measure 
its changes throughout the years, and to determine what factors have 
been most responsible for such changes. Changes in the stock from year 
to year are the result of varying rates of removals and additions. ‘There- 
fore, besides determining the size of the stock in different years, it is 
necessary to measure the yearly removals from the stock by catch and 
natural mortality, and the yearly additions by recruitment and growth. 

If these variables could be measured accurately, we should be in a 
position to evaluate their relative importance in determining the size of 
the stock and to determine whether the size of the spawning stock and 
of other stocks affects recruitment. With such information and other 
general life history facts, it should be possible to determine at what 
level the stock should be maintained, to determine what mode and 
intensity of fishing will result in the maximum sustained production of 
haddock, and to make periodic predictions as to future production of the 
fishery for the industry. 


213 


| 
“i 
i 
| 

2 
| 
: 
i 


214 BIOMETRICS, SEPTEMBER 1949 


A basic equation is: 


S+@+R+M)-C+N+M)=S, 
| where: 


S = size of population at the beginning of the year. 
S, = size of population at the end of the year. 

“ G = additions to the population during the year by growth. 

a R = additions to the population during the year by recruitment of 
young. 

M = additions due to immigrations. 

C = deductions from the population during the year by the fishery. 

N = deductions from the population during the year due to natural 
mortality. 

M, = deductions due to emigrations. 


BRUNSWI NEWFOUNDLAND 


J 
ne 
od 
FIGURE 1. 
LOCATION Ur FISHING BANKS OFF NEW ENGLAND, NOVA SCOTIA. AND 
NEWFOUNDLAND. 


; It is believed that the population of haddock inhabiting the New 
England Banks (Georges) (Fig. 1) is largely independent of the popula- 
z tions on the Nova Scotian and Newfoundland Banks. Assuming this 
to be true, and if we consider the population on Georges Bank only, 
_ there will be no important changes in the stock from year to year due to 


| 
| 
4 
= 
oe | 
a | 
ae 
4 
| 
| 
: 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 215 


TABLE 1 
RELATIVE SIZE OF THE GEORGES BANK HADDOCK POPULATIONS IN TERMS OF 
THE AVERAGE NUMBERS OF FISH PER DAY TAKEN BY A STANDARD GROUP OF 
OTTER TRAWLERS 


Numbers 
Year per day 
1931 3,032 
1932 4,324 
1933 3,630 
1934 4,049 
1935 4,927 
1936 5,590 
1937 4,404 
1938 4,833 
1939 5,502 
1940 4,979 
1941 6,960 
1942 7,941 
1943 7,319 
1944 5,737 
1945 5,347 
1946 4,956 
1947 4,954 
Average 5,205 


immigrations or emigrations; and M amd M, can be left out of the — 
equation. 

Also, if we consider the population as numbers, rather than pounds 
of fish, G or “growth”, can be left out too. Furthermore, if we define 
the population S as being the number of fish of certain year classes at 
the beginning of a year and S, as the number of fish of the same year 
classes at the end of that year, then there can be no recruitment; and R 
can also be ignored. Thus, the equation for certain purposes can be 
reduced to: 

S-(C+N)=S, 


Available for use in this equation are biological and statistical data 
for the Georges Bank population going back to 1931. These data were 
assembled by the Haddock Investigation of the United States Fish and 
Wildlife Service and its predecessor agency, the United States Bureau of 
Fisheries. 

The remainder of this paper will be devoted to: (1) developing an 
" index representing the size of the population in terms of numbers of 


34 
7 
| 
| 
| 


216 BIOMETRICS, SEPTEMBER 1949 
RELATIVE SIZE OF POPULATION 
1931—1947 

‘ 
> 7 
a NO 
x5 —— 
7) 
4 
5 VA 
x | 
= 

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 
YEARS 
FIGURE 2. 
RELATIVE SIZE OF THE POPULATION, IN TERMS OF THOUSANDS OF FISH PER DAY 
BY YEARS. 


haddock of definite ages and year classes, at the beginning and end of 
yearly periods (S and S,); (2) measuring the fishery removals (C) of 
haddock of each age during each of the 17 years, 1931-47; and (3) 
determining how important the yearly fishery removals are in decreasing 
the stock from the beginning to the end of yearly periods. 


SIZE OF THE STOCK OR “S” AND “S;” 


Total catch represents fishing removals and in itself is a vital piece 
of information. It does not, however, represent abundance, or the rela- 
tive size of the population on the Bank, inasmuch as the amount of fish- 
ing effort utilized to make the catch varies among years. 

The index representing the relative size of the population that was 
used in the Haddock Investigation is the average yearly catch per day’ 
of a standard group of large otter trawlers which fished out of Boston 


1Details of this abundance analysis were developed by W. C. Herrington and G. A. Rounsefell. 


| 
we 
4 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 217 


during this 17-year period. The relative size of the population’ was first 
expressed in terms of the average number of pounds per day taken by 
these trawlers in each year. By the use of yearly average weight data, 
the statistics on relative population size were converted from pounds to 
numlwrs of fish (Table 1 Fig. 2). 

In each year and season a sample of the haddock that were landed 
had been obtained, and from those fish obtained, scales had been col- 
lected. Then, for each year and season, the ages of these sample fish- 
were determined. This determination was made by studying the pro- 
jected impression of these scales. Figure 3 shows a photograph of such 
a microprojection of a scale from a Georges Bank haddock. 

The fish were aged as having completed their first, second, third, 
fourth, fifth, sixth, seventh, eighth, and ninth year of life, and were 
correspondingly classified as fish of 1 to 9 years of age. The category 
of 9-year-olds includes 9-year-old and older tish. (The number of 
haddock of ages greater than 9 years was very small, amounting in the 
aggregate to less than one-half of one percent of all haddock in the 
catch.) 

By using the percentage age composition that had been computed fo: 
haddock of each length and for each year and season, the total number- 
of fish caught per day were reduced to numbers per day of each age 
(Table 2 and Fig. 4). The average abundance jor ail 17 vears (Fig. 5) 
amounted to 116 one-year-olds, 1,472 two-year-olds, 1,57! three-year- 
olds, 920 four-year-olds, 557 five-year-olds, 324 six-year-olds, 149 seven- 
year-olds, 61 eight-year-olds and 34 nine-year-old and older fish. It can 
be concluded that the relative abundance of fish in the catch diminishes 
quite regularly for those fish three years old and older. The fact that 
the one- and two-year-old fish are less abundant indicates that these age 
groups are not fully available to the fishery. 

In order to measure the diminution in the stock over the period of a 
year, it was desired to compare the relative population size of the fully- 
available age groups of fish at the beginning of each year with the size 
of the corresponding stock at the end of the year. Table 2 gives the 
average population size for the “haddock” year.’ In order to obtain a 


tWhere “population size’’ is mentioned in the remainder of this paper it refers to this index of 
relative population size. Although it has not yet been possible to determine the exact relationship 
between the actual number or pounds of fish in the stock and our calculated index of relative population 
size, the index appears to suffice for the purpose used here. 
3The “haddock” year consists of seasons A, B, C, and D as follows: 

A—February, March, and April (spawning season) 

B—May, June, and July 

’—August, September, and October 

D—November, December, and January 


|. 
wae 
1 
Bo 
4 
| 


BIOMETRICS, SEPTEMBER 1949 


218 


4TH YEAR 


FIGURE 3. 
M A HADDOCK THAT HAD JUST COMPLETED ITS 


UGHT APRIL 1939 ON GEORGES BANK. THE MARKS INDI- 
N OF EACH YEAR OF GROWTH. THE LENGTH OF THIS FISH 


NCA 


CATE THE COMPLETIO 
WAS 22 3/4 INCHE 


PHOTOGRAPH OF A SCALE FRO 
S. 


FOURTH YEAR WHE 


SSS 
{ 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 219 
TABLE 2 


RELATIVE POPULATION OF EACH AGE OF GEORGES BANK HADDOCK IN TERMS OF 
NUMBERS CAUGHT PER DAY 


Age in years 


Year ————_- 
All Ages 1 2 3 4 5 6 7 8 9 and 
older 
1931 3,032 147 691 158 439 699 466 256 132 44 
1932 4,324 1l 210 2,829 275 413 323 146 74 43 
1933 3,630 44 986 720 1,145 249 193 145 67 81 
1934 4,049 141 966 1,108 690 678 241 125 60 40 
1935 4,927 202 1,704 1,306 574 509 428 97 74 33 
1936 | 5,590 157 1,752 1,834 920 402 236 222 41 26 
1937 4,404 150 1,233 1,327 698 535 251 119 65 26 
1938 | 4,833 165 2,590 988 489 234 199 114 31 23 
1939 5,502 95 1,775 2,416 640 271 123 108 42 32 
1940 4,979 524 1,116 1,689 1,018 309 184 93 28 18 
1941 6,960 144 3,298 1,275 1,046 752 233 123 40 49 
1942 7,941 94 3,036 2,567 1,037 624 362 158 36 27 
1943 7,319 ll 1,026 | 3,470 1,551 530 492 149 61 29 
* 1944 5,737 14 135 1,412 2,609 948 416 95 91 17 
1945 5,347 25 1,663 420 1,244 1,218 485 194 61 37 
1946 | 4,956 24 856 1,992 400 854 562 217 49 2 
1947 4,954 18 1,996 1,189 863 250 314 180 89 55 
Total | 88,484 1,966 | 25,033 | 26,700 | 15,638 | 9,475 5,508 | 2,541 | 1,041 582 
Avg. 5,205 116 1,472 1,571 920 557 324 149 61 34 


value of the population size at the beginning of the year while eliminating 
the effect of the seasonal cycle in availability it was necessary to recom- 
pute these data. 

All data originally had been computed on a seasonal basis: for exam- 
ple, Table 3 shows the seasonal population size data from which Table 2 
was derived. In order to obtain values for the population size that more 
closely represent values at the beginning of each year,* the abundance 
values for seasons C, D, A, and B in Table 3 were averaged. In this 
recombination it was necessary to consider that 3-year-old fish in seasons 
C and D become 4-year-old fish in seasons A and B of the following 
year and that other ages progress accordingly. 

For example, to obtain the rélative size of the population of 4-year-old 
fish at the beginning of 1935 the following figures” were used: 


‘It is recognized that by summarizing values for 4 seasons and dividing by 4, the average obtained 
does not under some conditions represent the average of the midpoint and thus the exact beginning of 
the year. For the purpose of this analysis, however, such a calculation represents the beginning of the 
year accurately enough. 

‘An exception to this rule was made in computing the relative population of 9-year-old haddock 
at the beginning of the year. Since this group includes all older haddock, 8-year-old and 9-year-old 
haddock from seasons C and D were added to 9-year-old haddock from A and B and the total 
of these 6 figures, instead of the usual 4, was divided by 4 to give the average. 


| 
3 
| | | 
= 
| 
: 
th 
Cl. |. 
| 
| 


220 BIOMETRICS, SEPTEMBER 1949 
RELATIVE SIZE OF POPULATION OF EACH AGE 
1931 — 1947 
1931 1940 
42 
1932 1941 
2r + 7 42 
1933 i942 
42 
1934 1943 
Bo fe} 
i 1935 1944 
+ 42 
3 
z 1936 1945 
Ser 42 
1937 
42 
19386 1947 
2b T 42 
1939 
2r a 
° 


AGE IN YEARS 


FIGURE 4. 
RELATIVE SIZE OF HADDOCK POPULATION OF AGES 1-9, FOR EACH OF THE 17 YEARS. 


Number of fish 
per day 
3-year-old haddock, season C, 1934... ..... 1,425 
3-year-old haddock, season D, 19384 ........ 431 
4-year-old haddock, season A, 1935 ........ 583 
4-year-old haddock, season B, 19385 ........ 889 


Using this system the relative sizes of the population of 4- to 9-year- 
old fish at the beginning of each year were computed and are shown in 


ne 
2 
vy, 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 


AVERAGE ABUNDANCE OF EACH AGE, ALL YEARS 

2000 

1600 
> 

< 1200 
a 

x 800 

400 

| 2 > eh 5 6 7 8 9 
AGE IN YEARS 
FIGURE 5. 


RELATIVE POPULATION SIZE OF HADDOCK OF AGES 1-9. AVERAGE OF ALL 17 YEARS. 


Table 4. Since computation of the catch-per-day of 3-year-old fish at 
the beginning of the year involved use of figures for the less available 
2-year-old fish in seasons C and D, it was decided to omit the 3-year 
group. 

The next step was to decide whether to consider the age groups sepa- 
rately or in the aggregate. Examination of the data in Table 3 indicated 
that the decrease in catch-per-day for individual year classes from year 
to year was rather variable, hence, it was desirable to combine age groups. 
Thus the total of all fish of ages 4 to 9 years for the beginning of each 
year are shown in the right-hand column of Table 4. 

It was next necessary to compute the size of the population at the 
end, in addition to at the beginning, of each year. The average popula- 
tion at the beginning of the year or seasons (C + D + A + B)/4, 
approximates the value of the midpoint between D and A. Therefore, 
values for the number of fish at the beginning of the year are the same as 
values for the number at the end of the preceding year. 

For example, from Table 4, if there are 1,793 five-year-old fish per 
day at the beginning of 1945, then there are 914 (the number of 6-year- 
olds at the beginning of 1946) 5-year-olds at the end of 1945. 


a 
mee 
| : 
| 
pale 


222 BIOMETRICS, SEPTEMBER 1949 
TABLE 3 

: RELATIVE SIZE OF POPULATION OF EACH AGE OF GEORGES BANK HADDOCK 

a BY SEASONS IN NUMBERS CAUGHT PER DAY 

Age in years 

Year | Season 

; No. all 1 2 3 4 5 6 7 8 9 and 

years older 

1931 4 3,268 30 193 560 | 1,088 781 372 155 89 

: B 3,182 81 70 770 | 1,139 630 336 119 37 
q Cc 2,562 897 186 245 348 353 269 224 40 
. D 3,114] 587 | 1,755 183 179 222 99 45 2 12 

: 1932; A 4,281 11 | 2,418 489 707 387 143 80 46 
| 4,937 38 | 3,333} 359 | 410] 351] 275 95 76 
7 Cc 5,848 3 430 | 4,253 149 343 458 96 91 25 
; D 2,231 41 361 | 1,310 103 191 96 70 32 27 
y 1933 4 3,697 112 254 | 1,728 411 423 324 183 262 
ae B 4,349 1,318 494 | 1,814 277 185 161 56 44 
: Cc 4,487 39 | 1,724 | 1,461 875 207 93 63 17 8 

D 1,988 138 789 671 164 100 72 31 12 11 
1934| A 3,729 4 | 1,360 471 874 574 198 150 98 
7 B 4,299 290 | 1,217 | 1,368 866 209 252 41 56 
Cc 4,619 1,929 | 1,425 544 502 148 33 37 1 
D 3,549 565 | 1,640 431 377 472 33 17 10 4 
i 1935, A 3,215 23 884 583 748 607 185 86 99 
B 5,536 1,117 | 1,719 889 740 884 52 131 4 
Cc 5,495 16 | 2,443 | 1,848 590 367 152 67 9 3 
D 5,462 791 | 3,235 773 234 179 71 84 71 24 
1936) A 5,827 267 | 2,078 | 1,688 903 397 356 47 91 

B 7,217 4 | 1,760 | 3,118 997 517 341 377 103 
; c 6,171 93 | 3,618 | 1,537 692 41 69 110 2 9 
D 3,143 532 | 1,361 603 303 145 138 45 11 5 
1937; A 5,224 1 423 | 1,810 | 1,167 988 500 176 137 22 

; B 4,969 .. | 1,068 | 1,858 949 552 282 161 54 45 
: C 5,175 361 | 2,757 | 1,265 439 243 24 51 21 14 
F D 2,247 237 685 375 237 356 196 90 46 25 
a 1938 4 sio7s |... 363 | 1,015 694 341 390 159 56 60 
ke B 4,736 | ... | 2,133 | 1,233 614 288 270 178 13 7 
nag Cc 7,425] ... | 5,500] 1,241 384 162 54 52 26 6 
< D 4,092 660 | 2,363 463 264 144 81 68 28 21 
1939} A si... 287 | 2,260 873 515 191 179 66 92 

; B 6,629 | ... | 1,604 | 3,371 958 251 223 143 66 13 
Cc 6,848 30 | 3,057 | 3,157 424 148 16 13 3 eae 
D 4,066 349 | 2,150 877 307 174 56 96 34 23 


Also, the aggregate population of 4-year-old and older fish at the 
é beginning of a year would produce survivors at the end of the year 
oe which would amount to the number of 5-year and older fish at the begin- 
ning of the next year. For example, if the total population of 4-year-old 
; and older fish at the beginning of 1944 was the total of 3,188 four-year- 


i 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 


TABLE 3—Continued 


Ages in years 


Year | Season 
No. all i 2 3 4 5 6 7 8 
years 
1940 A 2,805 sae 127 | 1,057 | 1,046 337 141 52 24 
B 6,245 a 1,200 | 2,449 | 1,521 509 234 268 42 
Cc 5,638 221 | 2,175 | 1,832 922 161 289 10 23, 
D 5,228 | 1,876 961 | 1,415 582 230 73 43 24 
1941 A 5,855 1} 1,463 | 1,735 | 1,289 | 1,003 181 135 22 
B 7,692 ee 2,165 | 1,732 | 1,639 | 1,222 497 209 96 
C 9 ,O82 128 | 6,498 | 1,075 757 381 123 81 4 
D 5,210 448 | 3,068 557 498 403 131 66 19 
1942 A 5,863 290 | 2,599 | 1,083 | 1,006 598 224 35 
B 8,769 1,636 | 3,780 | 1,674 858 444 244 74 
C 9,058 39 | 4,875 | 2,898 703 212 232 93 3 
D 8,074 336 | 5,344 989 688 421 173 73 31 
1943 A 7,067 116 | 3,282 | 2,069 6838 668 102 1l4 
8,247 wer 380 | 4,048 | 1,844 831 689 287 97 
Cc 7,316 eye 1,892 | 3,353 | 1,277 420 291 59 7 
D 6,647 45 | 1,716 | 3,194 | 1,013 173 320 146 29 
1944 A 7,107 Joa 1 | 1,207 | 3,622 ! 1,221 680 158 202 
B 6,029 are 35 | 1,396 | 2,586 | 1,385 448 64 sl 
¢ 6,792 nee: 309 | 2,362 | 3,055 730 224 78 34 
D 3,020 54 193 681 | 1,174 453 312 80 48 
1945 A 4,687 oer 33 499 | 1,023 | 1,851 762 351 138 
B 5,640 ey 1,555 374 | 1,862 | 1,097 469 193 12 
c 7,293 34 | 3,277 668 | 1,502 | 1,133 462 162 34 
D 3,769 67 | 1,783 138 589 792 248 69 58 
1946 A 4,768 ve 45 | 1,940 487 | 1,238 804 140 113 
B 6 ,630 ae 735 | 3,040 333 | 1,059 928 446 80 
Cc 4,403 45 | 1,306 | 1,723 310 535 268 215 
D 4,024 52 | 1,334 | 1,270 471 982 246 69 
1947 A 4,382 58 | 1,253 | 1,628 394 507 293 149 
B 3,589 1,052 | 1,015 600 264 337 175 94 
4 8,122 1 | 4,904 | 1,831 782 169 253 115 41 
D 3,721 73 | 1,969 658 442 171 159 136 z 
Avg. A 4,666 215 | 1,521 | 1,206 842 505 209 103 
all B 5,806 1,069 | 2,015 | 1,223 721 436 225 74 
years Cc 6,255 59 | 2,799 | 1,889 805 359 206 92 35 
D 4,093 403 | 1,807 858 448 307 147 72 33 


olds, 1,224 five-year-olds, 432 six-year-olds, 208 seven-year-olds, 122 
eight-year-olds, and 26 nine-year-old and older fish, or a total of 5,200; 
then the survivors from this group of year classes, after an interval of 
one year, would be the number of 5- to 9-year-old and older fish at the 
beginning of 1945, or 1,793 five-year-olds, 604 six-year-olds, 270 seven- 


| 
| 
21 
| 
= 
| 
| | 
8 
ll 
| 16 
| 34 
78 
26 
| 
43 
| 
| 


4 


224 BIOMETRICS, SEPTEMBER 1949 


TABLE 4 
SIZE OF HADDOCK POPULATION AT BEGINNING OF EACH YEAR! 


Age in years 

Year 

4 5 6 “4 8 9 and Total 
older 

1932 304 385 327 218 122 108 1,464 
1933 2,276 235 286 260 101 120 3,278 
1934 993 695 272 154 71 51 2,236 
1935 832 602 616 104 67 39 2,260 
1936 1,326 561 321 239 75 50 2,572 
1937 1,064 634 2 136 86 24 2,186 
1938 737 326 315 139 52 43 1,612 
1939 884 354 180 114 63 46 1,641 
1940 1,650 394 174 98 44 26 2,386 
1941 1,544 932 267 176 43 58 3 ,020 
1942 1,097 780 456 181 64 42 2,620 
1943 1,950 728 498 198 94 39 3,507 
1944 3,188 1,224 432 208 122 26 5,200 
1945 1,482 1,793 604 270 77 52 4,278 
1946 406 1,097 914 324 106 37 2,884 
1947 1,305 360 490 246 132 38 2,571 
Total | 21,038 | 11,100 6,394 3,065 1,319 799 | 43,715 
Average} 1,315 694 400 192 82 50 2,733 


- 'Values are the average of the number of fish of the particular age from Seasons A and B of the 
year in question, and of the number of fish of 1 year younger from Seasons C and D of the preceding year. 


year-olds, 77 eight-year-olds, and 52 nine-year-old and older fish, or a 
total of 4,278. 

Having already obtained the total number of 4-year-old and older 
fish at the beginning of the year (from Table 4), and having now com- 
puted the total number of 4-year-old and older fish at the end of each 
year (the number of 5-year-old and older at the beginning of the next 
year), all such totals were entered in Table 5. 

Computation of the yearly diminution of the stocks being measured 
was then only a matter of subtracting the value representing the stock 
at the end of the year from the value representing the stock at the begin- 
ning of each year. 


THE FISHERY REMOVALS OR “C” 


A measure of the yearly decreases in population size of completely 
available fish from year to year thus had been obtained. Inasmuch as 


as 
| 
ne 
| 
q 
4 
; | | | | 
: | | 
a} 
= 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 225 


TABLE 5 


RELATIVE SIZE OF POPULATION OF CERTAIN AGES OF HADDOCK AT THE 
BEGINNING AND END OF EACH 15 YEARS 


Number of 4- | Number of 4- 
to 9-year-olds | to 9-year-olds 
Year at beginning at end Decrease 
of year of year; 
1932 1,464 1,002 462 
1933 3,278 1,243 2,035 
1934 2 ,236 1,428 808 
1935 2,260 1,246 1,014 
1936 2,572 1,122 1,450 
1937 2,186 875 1,311 
1938 1,612 757 855 
1939 1,641 736 905 
1940 2,386 1,476 910 
1941 3,020 1,525 1,495 
1942 2,620 1,557 1,063 
1943 3,507 2,012 1,495 
1944 5,200 2,796 2,404 
1945 4,278 2,478 1,800 
1946 2,884 1,266 1,618 
Total 41,144 21,519 19 ,625 
Average 2,743 1,435 1,308 


1End of year = number of 5- to 9-year-olds at beginning of following year. 


the purpose of this study was to determine to what extent such decreases 
were associated with, or were the result of, the removals by the fishery, 
it’ was necessary next to determine how many fish the fishery had taken 
from the population in the various years. 

The fishery removals for the years 1931—47° were first tabulated in 
terms of pounds of fish. Having also the average weights of these fish 
that were landed, the total numbers caught were easily computed. The 
total pounds and numbers are shown in Table 6, and the, total numbers 
in Figure 6. 

The numbers caught were then reduced to numbers of each age by 
utilizing the percentage-age compositions referred to earlier. After 
summarizing by size groups and season, the number of fish of each age 
removed by the fishery in each of the 17 years is shown in Table 7. 


‘The landings for the ports of Boston, Gloucester, New Bedford, Mass., and Portland, Me. 


226 BIOMETRICS, SEPTEMBER 1949 


TABLE 6 
TOTAL CATCH OF HADDOCK FROM NEW ENGLAND BANKS 

Millions Millions 

Year of pounds of fish 
1931 101.801 34.979 
1932 86.706 32.348 
1933 70.272 26.623 
1934 39.683 15.617 
1935 68.579 28.565 
1936 73.496 31.489 
1937 83.973 32.528 
1938 80.202 33.570 
1939 91.181 38.911 
1940 81.676 31.345 
1941 111.611 46.944 
1942 97.786 41.299 
1943 80.215 33 .036 
1944 84.265 29 .062 
1945 65.284 22.091 
1946 90.802 32.678 
1947 98 .082 38.931 
Total 1,405.614 550.016 
Average 82.683 32.354 


DECLINE IN THE SIZE OF STOCK AS ASSOCIATED WITH 
VARIATIONS IN THE CATCH. 

In an earlier section of this paper the yearly declines in the relative 
size of the stocks of those ages of haddock that were fully available to 
the fishery were computed (Table 5). In the section just completed, the 
catch of fish of each age in each year has been computed (summarized in 
Table 7). By summing the catches of fish of 4 to 9 years of age inclusive, 
in each year, the numbers of fish that were removed from the corre- 
sponding stock between the beginning and the end of the years involved 
were computetl. Thus, in Table 8 data are presented which represent: 


(1) the decrease in the relative size of the stocks of 4- to 9-year-old 
fish from the beginning to the end of each of the 15 years (1932- 
46) in thousands of fish per day, and 
_ (2) the number of fish removed by the fishery from these stocks 
during each of these 15 yearly intervals, in millions. 


“if 
a 
4 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 227 


REMOVALS FROM THE GEORGES BANK POPULATION 


1931 — 1947 


1931 '32 '34 '35 '36 ‘37 '38 '39 ‘40 ‘42 ‘43 ‘44 ‘45 ‘46 ‘47 
YEAR 
FIGURE 6. 
CATCH OF GEORGES BANK HADDOCK IN TERMS OF NUMBERS OF FISH. 


A casual observation of this table shows, in general, that in years 
during which large numbers of fish were taken from the Georges Bank 
population, there were also large declines in the population size from the 
beginning to the end of the year, and that in years during which small 
numbers were removed, the population changed but little. 

These data have been plotted in Figure 7, with the ‘removal’ or 
“catch (C)” as the independent variable, and with the decline, the 
change in population size from the beginning to the end of the year, as 
the dependent variable. This figure is plotted in a rather unusual man- 
ner, with values of the dependent variable being plotted below, rather 
than above the origin. This has been done inasmuch as values of the 
dependent variable (change in population size) are actually decreases 
rather than increases, and it has been found that this method of plotting 
is more easily interpreted by some people. 


: 
= 
= 
3 
| 
| 
3 


228 BIOMETRICS, SEPTEMBER 1949 


TABLE 7 
AGE COMPOSITION OF CATCH, BY YEARS, IN MILLIONS OF FISH 
Age in years 
Year Total 
1 2 : 3 4 5 6 7 8 9 and 
older 
1931 1.661 8.167 1.802 | 5.089 | 7.975 | 5.291 | 2.949 | 1.555 490 34.979 
1932 099 1.712 | 21.139 | 2.008 | 3.051 | 2.383 | 1.079 .553 324 32.348 
1933 210 7.366 5.218 | 8.648 | 1.791 | 1.346 | 1.030 464 550 26 .623 
1934 296 3.807 4.470 | 2.889 | 2.518 825 482} .197 133 15.617 
1935 1.144 | 11.096 7.803 | 3.138 | 2.467 | 2.053 415 .360 089 28 .565 
1936 828 | 11.449 | 10.171 | 4.629 | 1.803 | 1.153 | 1.140 .217 099 31.489 
1937 1.193 | 10.129 9.715 | 4.890 | 3.574 | 1.608 815 416 188 32.358 
1938 961 | 18.453 6.866 | 3.304 | 1.568 | 1.312 765 .198 143 33 .570 
1939 565 | 12.806 | 17.379 | 4.383 | 1.807 804 .695 .272 200 38.911 
1940 1.895 6.692 | 11.061 | 7.261 | 2.188 | 1.286 -653 .191 117 31.345 
1941 697 | 21.404 9.026 | 7.389 | 5.303 | 1.632 .860 .280 353 46.494 
1942 290 | 13.106 | 14.877 | 5.938 | 3.648 | 2.131 941 205 162 41.299 
1943 016 3.653 | 15.659 | 7.423 | 2.742 | 2.385 .688 313 157 33 .036 
1944 054 675 7.410 |13.047 | 4.945 | 1.991 435 412 093 29 .062 
1945 101 7.046 1.698 | 5.285 | 4.862 | 1.941 .766 .223 169 22.091 
1946 191 6.709 | 13.251 | 2.406 | 5.000 | 3.289 | 1.589 .224 019 32 .678 
1947 .088 | 15.547 9.604 | 6.660 | 1.979 | 2.542 | 1.397 .690 424 38.931 
Total 10.289 |159.817 |167.149 |94.388 |57.221 |33.972 |16.700 | 6.770 | 3.710 | 550.016 
Average 605 9.402 | 9.833 | 5.552 | 3.366 | 1.998 .982 398 -218 32.354 


The straight line in Figure 7 was fitted to the data by the method of 
least squares and has the equation, 
Y = —.022 + .1135X where 
X = millions of haddock of ages 4-9 years removed from the stock in 
each of 15 years by the fishery. 
Y = decrease in relative population size of 4- to 9-year-old haddock 
during each of these 15 years in thousands of fish per day. 


The coefficient of correlation measuring the degree of association 
between these two variables is 0.81. With 13 degrees of freedom this 
values proves to be highly significant (1 per cent level = 0.64). R’ is 
about 0.66. Thus, it seems valid to conclude (under the assumption 
that the straight line best fits these data) that about 66 per cent of the 
variability in yearly decreases in population size, from the beginning 
to the end of the individual years, is explainable by the variations in 
the numbers of fish actually removed from the stock by the fishery. 
This value of 66 per cent is possibly a minimum estimate of the effect 
of the fishery, inasmuch as the index of abundance is probably not 
perfectly correlated with actual abundance. 

No attempt was made in this treatment to determine whether some 
curved line fitted these data better than this straight line. 


a 
i 
4 
@ 
45 
4 — 
] } 
| 
| 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 229 


TABLE 8 
DECREASE IN SIZE OF STOCK OF 4- TO 9-YEAR-OLD HADDOCK FROM 
THE BEGINNING TO THE END OF YEARS 1932-46, AND THE TOTAL 
CATCH OF THESE AGES IN EACH YEAR 


Decrease in stock Catch in 
Year thousands of fish millions 
per day of fish 
1932 .462 9.398 
1933 2.035 13.829 
1934 .808 7.044 
1935 1.014 8.522 
1936 1.450 9.041 
1937 1.311 11.491 
1938 .855 7.290 
1939 .905 8.161 
1940 .910 11.697 
1941 1.495 15.817 
1942 1.063 13.026 
1943 1.495 13.708 
1944 2.404 20 .923 
1945 1.800 13.246 
1946 1.618 12.527 
Total 19.625 175.720 
Average 1.308 11.715 


This line was arbitrarily extrapolated beyond the limits of the data 
towards the origin, although admittedly the exact position of the line 
where the removals are very small is unknown. It can be seen from 
Figure 7 that the intercept is practically at the 0.0 point. The position 
of this intercept, assuming that the population index actually represents 
the size of the population, poses interesting theoretical possibilities. 

First of all, the suggestion is raised that within the ranges of popula- 
tion size and fishing removals represented by these data, the losses due 
to factors other than the fishing removals, i.e., to natural mortality, 
may be negligible. Such a possibility could theoretically be true under 
the conditions of an intensive fishery, where fishing removals would take 
many fish which would otherwise be removed by natural causes. When 
one takes into consideration the relative lack of bottom dwelling preda- 
tors on Georges Bank that are large enough to consume haddock of 
2-10 pounds and the apparent lack of any disease epidemic or serious 
parasitism in haddock over the 17-year period, this possibility does not 
appear quite so improbable. 


: 
| 
3 
: 
* 
| 
| 


230 BIOMETRICS, SEPTEMBER 1949 


DECREASE IN POPULATION SIZE 
AS AFFECTED BY CATCH 


REMOVALS BY THE FISHERY 
MILLIONS OF FISH 


ie) 5 10 15 20 25 


THOUSANDS OF FISH PER DAY 


DECREASED POPULATION SIZE 


FIGURE 7. 


THE RELATIONSHIP BETWEEN THE YEARLY REMOVALS IN MILLIONS OF HADDOCK 
AND THE DECREASE IN RELATIVE POPULATION SIZE FROM BEGINNING TO THE 
END OF THESE SAME YEARLY PERIODS IN TERMS OF THOUSANDS OF FISH PER DAY. 


Secondly, the exact position of the line with populations of this 
general size, if the removals were greatly reduced and even became zero, 
is unknown. If the extrapolation, as in Figure 7, happens to be an accu- 
rate representation of this relationship, then one would conclude that 
with no fishing removals, as would occur if fishing were to suddenly cease, 
there would be no decrease in the stock and thus no natural mortality. 
Theoretically, however, if fishing were to be considerably reduced sud- 
denly, natural mortality would probably be greater than at present 
because some of the fish now being caught would be vulnerable to what- 
ever causes of natural mortality are in operation. With populations of 
present levels but with very small fishing removals, the line would 


3 
2 
Wy 


CATCH RELATIONSHIP TO CHANGE IN POPULATION SIZE 231 


possibly curve toward the Y axis and intersect it at some point greater 
than 0. 

The data and the ideas expressed here refer only to a heavily fished 
population and not to the relatively unfished populations of early days, 
or to the populations which would result if the sudden cessation of fish- 
ing were to continue for several years. In such populations, natural 
mortality would probably be greater yet, for such reasons as poorer 
nutrition of the larger stock, greater average age resulting in more 
deaths from senility, and so on. 

This general situation is to be studied by various lines of approach in 
future studies. From the present study, however, we may conclude that 
the number of haddock caught in various years by the New England 
fleet markedly affected the subsequent population of haddock of corre- 
sponding ages on Georges Bank. Although it is generally assumed in 
many fisheries that the fishery does affect the stock, instances where such 
an effect has been demonstrated clearly are extremely rare. This 
analysis, in addition to demonstrating this relationship, is also of con- 
siderable value in providing the basic data that can be used in determin- 
ing many other very important facts necessary for a broad understanding 
of the biometrics of the valuable New England haddock resource. Such 
facts include the actual number of fish present on the bank, fishing and 
natural mortality rates, growth rates, indices of the recruitment of 
young, the effect of various factors upon recruitment, and predictions 
as to the future abundance of this species. Investigations of these re- 
lationships are now being undertaken and will be reported upon soon. 


ACKNOWLEDGMENTS 


The system of collection of the basic biological and statistical data 
was developed by William C. Herrington, who until 1947 directed the 
North Atlantic Haddock Investigation. Employees of the United 

States Fich and Wildlife Service who aided the writer in summarizing 
the extensive data upon which this analysis is based include: James J. 
Miggins, Edgar L. Arnold, Jr., Louis D. Stringer, Frank A. Dreyer, 
Dorothy B. Monahan, Edward S. Phillips, Sterling L. Cogswell, Eliza- 
beth V. Nugent and Margaret A. Reeves. Scales were read by Edgar 
L. Arnold, Jr., and the writer. Interpretation of certain data collected 
in early years was facilitated through the assistance of John R. Webster. 
The manuscript was reviewed critically by Dr. William F. Royce and 
Dr. Ralph P. Silliman of the Branch of Fishery Biology, United States 
Fish and Wildlife Service. 


| 
| 
|. 
& 
| 
| 
| 
{ 


ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY*+ 


Joun W. TuKEy 


Princeton University 


INTRODUCTION 


ie DISCUSSING the possible shortcomings of the analysis of variance, 
much attention has been paid to non-constancy and non-normality of 
the “error” contribution. (The recent papers in Biometrics by Eisenhart 
[4], Cochran [3] and Bartlett [1] discuss these matters and give refer- 
ences.) The present writer is usually much more concerned with and 
worried about non-additivity, and until recently has suffered from the 
lack of a systematic way to seek it out, and then to measure it. (Con- 
versations with Frederick F. Stephan have contributed greatly to this 
development and presentation.) 

The purpose of the present paper is to indicate such a way, when the 
data is in the form of a row-by-column table. (The professional practi- 
tioner of the analysis of variance will have no difficulty in extending the 
process to more complex designs.) We shall show how to isolate one 
degree of freedom from the “residue’’, “‘error’’, ‘‘interaction” or ‘“‘dis- 
crepance”’, call it what you will. There are two known situations to 
which this single degree of freedom is expected to react by swelling: 


(1) when one or more observations are unusually discrepant; 
(2) when the analysis has been conducted in terms where 
the effects of rows and columns are not additive. 


The first situation is quite familiar and requires little explanation. The 
second occurs often enough, but may not be noticed. An example may 
help to fix the ideas. 

Let us construct an artificial example with 3 rows and 4 columns, 
with each entry contributed to overall, by rows, by columns, and by 
cells. Suppose that these contributions are as follows: 


*Prepared in connection with research sponsored by the Office of Naval Research. 
+Presented to the Biometrics Section and the Biometric Society at Cleveland, December 29, 1948. 


232 


. 4 ‘ 
i 
igg 
i” 
i 


ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY 


in general by rows by columns by cells 
4 4 4 4 6 1 0 1 -2 1 0 
| -3 -3 -3 6 1 -4 0 0 -1 2 -3 
i241 -3 -3 -3 -3 61 —-4 0 0 -2 -1 0 


Then the tables and corresponding analyses for the sum of all contribu- 
tions are: 


TABLE 1 
ILLUSTRATIVE EXAMPLE IN ORIGINAL TERMS 


Values and Means Analysis of Variance 


12 4 2 5 23 DF SS Ay 


TR 


5.8 

-4 -5 |-7 -1.8 
4+ -3 -7 -2 |-8 -2.0 
Rows 2 140 70 | 
Sums | 20 -1 -9 8 Columns 3 157 52 | 
Means | 6.7 —0.3—3.0—0.7 6 26 4 
| 


Now let us square the entries and divide by 10, rounding to integers. 
The resulting tables and analyses are: 


TABLE 2 
ILLUSTRATIVE EXAMPLE IN TERMS OF SQUARES 


Values and Means Analysis of Variance 
14 2 1 2 19 4.8 DF SS MS | 
2 0 2 2 6 1.5 
2 1 5 0 8 2.0 
Rows 2 24.5 12.2 | 
Sums | 18 3 8 4 33 Columns 3 46.9 15.6 
Means | 6.0 1.0 2.7 1.3 2.8 RXC 6 84.8 14.1 | 


Notice that all semblance of row or column effects have now van- 
ished, although Table 1 showed large and significant effects. The use 
of the squared scale has concealed the real effects. (It may be argued 
that squaring numbers which range from plus to minus is unrealistic. 
The answer is that this 7s an extreme example, but one that can be slowly 
and smoothly changed into a very mild one. There probably is a differ- 


3 
| 
233 
—3" 
: 
| | fi 
| | 
Soe 
| 
| 
| 
Wee 


234 BIOMETRICS, SEPTEMBER 1949 


ence in degree between this example and what happens in practice, but 
there is no difference in kind.) 


PROCEDURE 


How then do we isolate the single degree of freedom? The process 
is simple, and runs as follows: 


(A) To the row-by-column table, already bordered with sums and means, 
add a new border of deviations of means from the grand mean 
(decimal places may be reduced, but the sums of deviations, by 
rows and by columns must be forced to vanish). 

(B) Add an extra column (or row) and enter in each cell the sum of 
products of the deviations by columns and the entries in its row 
(or column). 

(C) Accumulate the sum of products between the deviations of row 
(or column) means and the new entries of (B). 

(D) Calculate the sum of squares of deviations by columns and by rows. 

(E) Divide the square of the number from (C) by the product of the 
numbers from (D). This is the mean square (and also the sum of 
squares) for the single degree of freedom. 


The process is illustrated on the same example below: 


TABLE 3 
SAMPLE CALCULATION 


Devia- | Sums of 
Sums Means tions | x-products 


14 2 1 2 19 4.75 2.0| 38.4 

2 0 2 2 6 1.50 -1.2 3.6 

2 1 5 0 8 2.00 -0.8 4.6 
Means 6.00 1.00 2.67 1.33 2.75 6.08 


Deviations} 3.2 —-1.8 0.0 -—1.4] 0.0 | 15.44 | 50.9 


| 
Sums 18 3 8 4 33 0.0 68.8 


(B): 14(3.2) + 2(—1.8) + 10.0) + 2(—1.4) = 38.4 
2(3.2) + 0(—-1.8) + 2(0.0) + 2(-1.4) = 3.6 
2(3.2) + 1(—-1.8) + 5(0.0) + 0(-1.4) = 4.6 


a 
rig 
| i 
74 | 
= 
a 
4 


ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY 
(C): 38.4(2.0) + 3.6(—1.2) + 4.6(—0.8) = 68.8 
(D): (3.2)? + (—1.8)? + (—0.0)? + (1.4)? = 15.44 

(2.0)? + (—1.2)? + (—0.8)? = 6.08 


(68.8)? 
(15.44)(6.08) ~ 
Assigning the mean square 50.9 to the degree of freedom for non-addi- 
tivity, which is subtracted from ““R X C”’, the analysis of variance of 
Table 2 becomes: 


Rows 2 24.5 12.2 
Columns 3 46.9 15.6 
Non-additivity 1 50.9 50.9 
Balance 5 33.9 6.8 


Thus the obvious thing about the illustrative example was its non-addi- 
tivity. The corresponding F value of 7.3 on 1 and 5 degrees of freedom 
is significant at the 5% level. 


EXPLANATION 


We have explained what we are looking for—non-additivity—and 
how to look—last section—but we have not explained what we are really 
doing. This we shall now try to do. Those experienced with single 
degrees of freedom may have already recognized the computation as a 
short-cut method of eliminating the single degree of freedom labeled by 


—1.8 0.0 <-—1.4 


2.80 2.0 
1.68 | =} 3.2 
1.12 0.8 


where 6.40 = (2.0)(3.2), —3.60 = (—1.8)(2.0), 2.16 = (—1.8)(—1.2) 
and soon. We have used the products of the deviations of the row means 
and the deviations of the column means to label this single degree of 
freedom. Since the sum of each column and of each row is zero, this 
degree of freedom is orthogonal to rows and to columns. It must be a 
part of “R X C”. This is what we did, but why? 


| 
235 
: 

2 
6.40 -3.60'0.00 — 
3.84 2.16 0. 

Bie 

| 
| 

| 


236 BIOMETRICS, SEPTEMBER 1949 


Let us take a special case, where there are row contributions, and 
column contributions, and nothing else. We start with perfect additivity. 
If x; is the column contribution (where 7 goes from 1 to c, the number of 
columns), and if y; is the row contribution (where j goes from 1 tor, the 
number of rows), then the 27 entry in the table is 


Now let us start to analyze a slightly nonlinear function of the a,,; . 
Instead of a;; , consider 


filais) = ai, + Mais — a)” 


where A is a small constant, and a is, for convenience, the average x + ¥ 
of all the a;;.. We find that we can write 


f(ai;) = [zi + + [yi + AQ + A(z; — — 


The first two terms depend, respectively, on the column alone and on the 
row alone, so the last one contains all the non-additive effect due to 
analysis in terms of f(a) instead of in terms of a. Notice that this non- 
additive effect is a multiple of 


(x; — =)(y; — 9). 
This means that it occurs in a single degree of freedom, which is identified 
in terms of x; — Z and y; — ¥. 
We assumed no error of measurement, or the like, and we wrote 
a,; = x; + y; without an additional term. This means that the differ- 
ence between the 7-th column mean and the grand mean is 


(2; — Z) + — (2; — 


which is nearly x; — Z when \ is small. Thus a satisfactory approxima- 
tion to the single degree of freedom we want is that indicated by the 
coefficients 


(column mean — grand mean)(row mean — grand mean). 


This is exact for the combination of no error and a very slight change 
from a to f(a), that is for no error and \ small. This fact plus empirical 
tests seems enough to warrant recommending general use of this single 
degree of freedom as a test of non-additivity. 


WHAT OF SIGNIFICANCE? 


Suppose that the test shows statistically significant evidence of 
non-linearity—what then? The simplest and laziest thing to do would 


ri 
= “¢ 
t 
; 
4 
% 


ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY 237 


be to forget the degree of freedom for non-additivity and go on and use 
the mean square for the balance in considering for example, the signifi- 
cance of the row effects. This is not recommended, for the following 
reasons: 


(1) In general, results expressed in terms in which effects are 
additive apply in a broader region and are practically 
more useful. 

(2) If the “error” or fluctuating contribution is not normally 
distributed, then it is not known whether or not the use 
of the balance mean square unduly inflates the apparent 
significance of other mean squares (for the case of a nor- 
mally distributed fluctuating contribution there is no 
distortion of significance.) 


For these reasons, the occurrence of a large non-additivity mean square 
should lead to consideration of a transformation followed by a new 
analysis of the transformed variable. 


This consideration should include two steps: 


(a) inquiry whether the non-additivity was due to analysis 
in the wrong form or to one or more unusually discrepant 
values; 

(b) in case no unusually discrepant values are found or indi- 
cated, inquiry into how much of a transformation is 
needed to restore additivity. 


The decision under (a) will depend on an examination of the data and 
all the background information available in the field—in particular the 
result of similar inspections of other experiments for non-additivity. 
What seems to be the best way of inspecting the results of a single experi- 
ment so far proposed is to plot the entries in the new column (of sums of 
cross-products) against the corresponding row means. A single unusu- 
ally discrepant observation will tend to be reflected by one point high 
or low and the others distributed around a nearly horizontal regression 
line. An analysis in the wrong terms will tend to be reflected by a 
slanting regression line. 


The figure shows such a plot, including 2s limits, for 


(A) the illustrative example worked above, 

(B) Youden and Beale’s data [6] as simplified by Snedecor 
[5, p. 44], 

(C) Beall’s experiment VI [2] on insect infestation, with plots 


i is 
al 
t 
| 
Gi 
| 
: 
\ 


BIOMETRICS, SEPTEMBER 1949 


GRAPHICAL ANALYSIS OF NONADDITIVITY 
(Ordinates are Sums of Cross Products, Dashed Lines are 2S Limits) 


A—ILLUSTRATIVE 


e@ e 
ie) 5 10 15 20 25 30 
MEAN OF ROW 
B—YOUDEN & BEALE 
4 
2 
7 
a 
a 
oO I 2 3 4 5 


MEAN NUMBER OF LESIONS PER LEAF 


treated alike combined (analyzed in terms of numbers of 
insects). 
(D) Cochran’s example [3] of an obviously discrepant value. 


40 
20 
» 
-20 
4 
20 
tow 
i. 
-20 
3 


ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY 239 
C—BEALL 
200 
e 
4 
4 
7 
100 
4 
4 
50 7 
AA 
10 15 20 25 
MEAN NUMBER OF INSECTS PER PLOT-PAIR 
D—COCHRAN 
.020 
e 
4 
4 
.O15 
A 
.O10 
7 a 
.005 
e 
4 
.700 760 .780 .800 .820 


MEAN RATIO OF DRY TO WET GRAIN 


| 
| 
| 
i 
AR 
Hy 


240 BIOMETRICS, SEPTEMBER 1949 


The limits are set by the formula 


( average ) ( sum of squares of ewe aanee 
cross product deviations of column means/ \ for balance 


For the illustrative example (Case A), this becomes 


15.53 + 2 (15.44)#(6.8)? = 15.5 + 20.5 = —5.0 and +36.0. 


In every one of the four cases, the plotted points could be accounted 
for by non-additivity due to analysis in incorrect terms. Cases A and D 
can also be accounted for by a discrepant point. This suggests that it 
will be hard to make this distinction for single experiments on this scale. 
When several small experiments are available for analysis, agreement in 
signs of the slopes of the graphs or equivalently, the signs of the sums 
obtained in Step C may show up analysis in incorrect terms. 

Why does the graph fail to decide about Cases A and D? The reason 
is simple—either explanation is plausible. If in Case A we alter the 
upper left-hand entry from 14 to 2, the analysis of variance becomes: 


DF ss MS 
Rows 2 0.5 0.2 
Columns 3 4.9 1.6 
Non-additivity 1 0.2 0.2 
Balance 5 12.6 2.5 


Thus we see that our illustrative table of 3 X 4 entries could have per- 
fectly well come from an additive situation where exactly one entry has 
been seriously disturbed. 

Similarly in Case D, taken from Cochran’s paper, if a nonlinear 
function is chosen so that 


704 < y < .792, 


gy) = 
800, y = 1.035, 


then his table is converted into one where the F-ratio for non-additivity 
against balance is 0.8 instead of 27.6. We know that this table arose 
from an error in computation, but it could equally well have come from 
an additive table analyzed in the wrong terms. 

In each case, the graphical solution has gone as far as it reasonably 


| 
{ts 
> 
+ 
| 
= 
4 
it 
he 
| 
ig 
( 
ry 
4 


ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY 241 


could in assigning responsibility for the non-additivity. While the 
graphical analysis is not certain to settle Step (a), it may be expected 
to be a big help. 


AID IN CHOOSING A TRANSFORMATION 


If it has been decided that the wrong terms had been used, then the 
actual size of the mean square for non-additivity must be useful for 
choosing an appropriate transformation. We lack experience with the 
more délicate use of such information, so that it seems appropriate to 
stop here with the following table which shows the connection between 
the sgn of the final sum of products (which was +68.8 in the illustrative 
example) and the type of transformation which may then be appropri- 
ate. 


TABLE 4 


SIGN OF FINAL SUM OF PRODUCTS WHEN CERTAIN TRANSFORMATIONS 
ARE APPROPRIATE (VALUES OF z OR xz + a NON-NEGATIVE) 


Transformed : 
values which Conditions Sign when x Important 
are additive* needed is analyzed special cases 
+ V2, Vr +1 

z? or 

p=1 0 (x) 

| (x + a)P 

| - z*, z* 
log (x + a) (none) = log z, log (1 + z) 


*Multiplication by a fixed constant and addition or subtraction of a fixed constant freely possible 


While the removal of non-additivity by transformation usually tends 
to stabilize the variance, there may be cases where the variance is no- 
tably non-constant after transformation. In such cases, analysis of the 
transformed data using weights seems appropriate. 


APPENDIX 
VALIDITY OF THE ANALYSIS 


This section is prepared for those who may feel that the method of 
obtaining the “single degree of freedom” may not produce quantities 
with the usual distribution. 

The basic fact is this: If u, , uw, --- 


» Ue 501,02, +++ ,v,, have some 


ie 
the 
wee 
| 
| 
. 


242 BIOMETRICS, SEPTEMBER 1949 


joint distribution, and if, for fixed u, , u., --- , u, , the distribution of 
V, , 02, °** , ¥» exists and is always the same, then the marginal distribu- 
tion of v, , v2, +++ , ¥, exists and, indeed, is the same, and, furthermore, 
+--+ ,v,, are independent. This can be estab- 
lished either by general considerations or by analytical detail. 

To apply this in our case, let u, , U2, --- , u, be the row and column 
means, and let v, and v2 be the sums of squares for non-additivity and 
for the balance. If the situation is additive, and the cell effects are 
normally distributed, and u, , wu, , -*- , u& are fixed, then v, and v2 are 
independently distributed like o” times chi-squares on 1 and re — r — c 
degrees of freedom. Hence v, and v, have these distributions, and are 
independent of all functions of row and column means. Thus the F-tests 
of rows, columns, or non-additivity against balance are valid. 

In the presence of non-additivity and/or non-normality, the usual 
arguments indicate that the F-test is, if anything, conservative. 


REFERENCES 


{1] Bartlett, Maurice 8. The Use of Transformations. Biometrics 3, 39-57, 1947. 

(2] Beall, Geoffrey. The Transformation of Data from Entomological Fieid Experi- 
ments so that the Analysis of Variance becomes Applicable. Biometrika 32, 
243-262, 1942. 

[3] Cochran, W. G. Some Consequences when the Assumptions for the Analysis of 
Variance are not Satisfied. Biometrics 3, 22-38, 1947. 

(4) Eisenhart, Churchill. The Assumptions underlying the Analysis of Variance. 
Biometrics 3, 1-21, 1947. 

[5] Snedecor, George. Statistical Methods. The Collegiate Press, Ames, Iowa; 
4th edition, 1947. 

[6] Youden, W. J. and Beale, Helen Purdy. A Statistical Study of the Local Lesion 
Method for Estimating Tobacco Mosaic Virus. Contributions from the Boyce 
Thompson Institute 6, 437-454, 1934. 


43 
4 
“ 
i 
ke 
ae 
i 
4 
' 


ON A STATISTICAL APPROXIMATION TO THE 
INFECTION INTERVAL 


J. B. CuHassan* 


iP A PREVIOUS PAPER (2) the existence of strong correlation between the 
logarithms of the morbidity rates of a group of respiratory diseases 
for successive calendar month-pairs was demonstrated. ‘The case rates 
involved pertain to the combined incidence of catarrhai bronchitis, acute 
coryza, acute catarrhal pharyngitis and laryngitis, and influenza, as 
diagnosed and reported in the United States Army. Where C; is the 
case rate observed in the 7-th calendar month, and C’;., , the eorrespond- 
ing rate observed in the succeeding calendar month of the same year 
(or the same winter when 7 = December), the value of riog ¢; tog c+, {0F 
the twelve month-pairs averaged .84, each of the twelve coefficients 
being based upon some 38 observations, according to the number of 
years for which data were available for each month-pair. The purpose 
of the present paper is to relate some of the results obtained in connec- 
tion with ref. (2) to the law of mass action in epidemiology, and to derive 
therefrom an estimate of the infection interval for.an assumed period of 
immunity following infection, or conversely, an estimate of the period of 
immunity corresponding to a known infection interval. In connection 
with the actual numerical values presented, it should be noted that they 
pertain to a group of diseases and therefore can be interpreted only as 
average for the group as a whole. 

The law of mass action in epidemiology states that the rate at which a 
contagious or epidemic disease spreads in a community is proportional 
to the product of the number of infectious individuals and the number of 
susceptibles in the community. If two consecutive time intervals are 
chosen such that the length of each interval is equal to the period 
between contact and case manifestation (i.e., the incubation period), a 
contact between an infectious person and a susceptible in the first 
interval will result in a new case during the second. Then the law of 
mass action may be written as 


*The author wishes to acknowledge the helpful criticism and suggestions of Prof. John W. Tukey 
of Princeton University in connection with this paper. 


243 


Pas 
| 
wis 


244 BIOMETRICS, SEPTEMBER 1949 


1 
™m 


S.I; (1) 
in which 


(a) C;,, is the expected number of cases (or the case rate) during the 
(¢ + 1)-th period. 

(b) S, is the average number of susceptibles in the 7-th period. 

(c) J, represents the average number of infectious individuals during 
the 7-th period. 

(d) m7’ is the proportionality constant reflecting such factors as the 
degree of crowding in a community, seasonality; more abstractly 
“infective power’. 


For the case in which the period of communicability following infec- 
tion is relatively short, it is convenient to consider incidence in successive — 
intervals whose lengths are each equivalent to the infection interval, 
rather than to the incubation period. The infection interval may be 
defined as the average period between the manifestations of two cases, 
one case resulting from contact with the other. It can be regarded as the 
sum of two components: first, the (average) time it takes for adequate 
contact to take place between a newly infected person and a susceptible, 
and then, the period between contact and manifestation. In such a 
case, we may replace J; by C; in equation (1), obtaining Soper’s formula, 


1 
= m S.C; (2) 


which gives the relationship between incidence rates in two consecutive 
periods whose lengths are each equal to that of the infection interval. 

Soper (1) has also stated the relationship for the case in which the 
incidence rates are taken over successive periods of arbitrary length. If 
C, is the case rate observed during the 7-th month, and S;, is the average 
number of susceptibles in the 7-th month, then 


= 


where C,., is the incidence rate in the (7 + 1)-th month, and p represents 
the numk =r of infection intervals in one month. 

If C;-is expressed as a daily incidence rate in terms of the number 
infected per day out of each 1000 population, and if n is the number of 
days of immunity following infection, then nC; will give the average 
number per 1000 population who are not susceptible, by virtue of recent 
infection, during the month in which C; is observed. On the assumption 


3 
| 
ok 
we 
‘ 
‘ 
ate 
| 
. 


STATISTICAL APPROXIMATION TO INFECTION INTERVAL 245 


of general susceptibility in the population, the corresponding number of 
susceptibles per 1000 will then be given by 


S; = 1000 — nC, (4) 
Substituting this value in (3), we obtain 


(1) (1000 — nC)°C,; or 


log Cis, = p log m™ + p log (1000 — nC;) + log C; (5) 


Interpreting this equation in a statistical sense, i.e., as a regression 
function in which z;,, = log C,., is regarded as the average value corre- 
sponding to a fixed observation of z; = log C; , the data for the group of 
respiratory diseases under consideration indicates that the true regression 
curve of x,,, on x; increases monotonically with slowly declining slope 
over the actual range of observations. Apart from sampling differences 
a straight line of the form 


Lier = a+ (6) 


fitted by the method of least squares should lie close to the regression 
curve over the rarige of observed values of x; , and the slope of the line, 
b, should very nearly equal the slope, 8, of the secant which intersects the 
true regression curve at points corresponding to the lowest and highest 
of the observed values of z; , respectively. An approximation to the 
infection interval can then be obtained by equating )b, the slope of the 
linear regression of x;,, on 2, , to 8, the slope of the secant. 

If C;, represents the lowest of the observed rates in the i-th month, 
and C,, , the highest; their substitution, in turn, for C; in equation (5) 
above yield, as co-ordinates of the secant at the points of intersection 
with the mass action curve, the points, 


and 


respectively, where each ordinate is expressed as a function of the corre- 
sponding abscissa. 

Then, from elementary analytic geometry the nana of the secant 
will be 


+ 
fee 
a 
pi 
at 


246 BIOMETRICS, SEPTEMBER 1949 


1000 nC;, 


Solving for p, and substituting b for 8, 


p = (b — 1) | 


nC.) 
\1000 — nC,, 


(8) 


Upon applying formula (8) to the data of reference (2) for the twelve 
month-pairs, a median value of p = 15 was obtained on the assumption 
of three weeks of incidence as equivalent to the number of non-suscepti- 
bles, i.e., when n = 21. Since the incidence data were taken over 
monthly intervals, the corresponding estimate of the average infection 
interval is 2.0 days. On the assumption that n = 28, the median value 
of p is 11, and the infection interval, 2.8 days. Finally, if the assumption 
is made that n = 42, an average infection interval of 4.1 days is esti- 
mated. Thus in the neighborhood of the assumed values of n, the ratio 
of the period of immunity to the length of the infection interval is 
approximately 10: 1. 
Illustrating the procedure graphically, the chart given shows: 


(a) a plotting of observed points corresponding to the February- 
March relationship 

(b) a theoretical drawing of (5), represented by the curve 1M’, and 
interpreted as a regression curve 

(c) the least squares linear regression of z;,, on 2, , LL’, fitted to 
the scatter of points 

(d) a secant to the curve MM’, drawn as a dashed line; the secant 
is drawn so that it intersects the curve to the left at the point 
whose abscissa is log C;, , where C;, is the smallest of the ob- 
served values C; , and to the right, at the point whose abscissa 
is log C;, , where C,, is the largest of the observed values of C; . 


The position of the curve 17M’ in relation to its secant and to the 
least squares line LL’, (again, apart from sampling errors) can be de- 
termined by formulating the vertical distance between MM’ and the 
secant. By differentiation, both the maximum distance and the value 
of log C; at which the maximum distance occurs can be determined. 
Thus if the equation of the secant is given by 


log Ci4n = a+ B log C;, (9) 


Pat 
an 
af 
44 
4 
i 
} 


STATISTICAL APPROXIMATION TO INFECTION INTERVAL 247 


GRAPHICAL REPRESENTATION OF THE ESTIMATING RELATIONSHIPS IN 
THE APPROXIMATION TO THE INFECTION INTERVAL 


Xion” /og 
(MARCH INCIDENCE) 


C; 
(FEBRUARY INCIDENCE ) 
THE SECANT TO THE LAW OF MASS ACTION CURVE, INTERSECTING THE CURVE AT EXTREMES OF 


AL; IS ASSUMED PARALLEL TO THE LEAST SQUARES LINE OF REGRESSION. 


it will be found, by substituting C;, = C; in equation (9) and in (5) 
that 


a = p log m™ + p= log (1000 — nC;,) + (1 — B) log C;, . 
The distance from the secant to the curve will then be 


1000 — nC, 

= p log (1000 = me) + (1 — B) log (10) 
where b can be substituted for 8, and p is obtained from (8). 

The maximum value of ¢ can, of course, be obtained by differentia- 

tion with respect to C; , or log C; and equating to zero. Then the curve 


e 
LINE OF L 
Least 
: 
M7, 
° 
A, 
Law OF aw ail 
MASS ACTIO 
7 
L 
‘CANT 


248 BIOMETRICS, SEPTEMBER 1949 


MM’ can be closely approximated from the position of the least squares 
line. Taking 


M=L+¢-—1/2max¢ 


where, for a fixed value of log C; , L is the corresponding value of log C;., 
on the least squares line, and ¢ is taken from (10), M is the corresponding 
value of log C;,, , on the curve. 

For an assumed value of p, equation (8) can be solved for rn. From 


1000 — nC, 
~ 1000 — 


In applying the foregoing type of analysis the following modifications 
or limitations should be considered: 

(i). We have assumed that for a fixed month-pair the infectivity 
factor, m~', is constant, except for random variation. Were it not for 
the fact of a declining number of susceptibles, S; , with increasing C, , as 
described by equation (4) above (i.e. if S; were constant over the range 
of C,), the mass action curve as given by equation (5) above would as- 
sume linear form with slope unity. But the declining value of S, has the 
effect of causing the slope to drop with increasing C,; , so that apart 
from sampling errors, the slope of b (and of 8) will be less than unity. 
This can be seen quite easily if we write equation (3) as 


AC; 


we obtain 


Then, if A; were constant for all C; , a plotting of the curve (on a log-log 
scale) would yield a straight line parallel to 


Cia = C; 


at a vertical distance of log A; . But with the damping effect of the 
decline of susceptibles as C; increases, A; correspondingly decreases; and 
if, for example, log A; is still positive the distance between the two lines 
decreases with increasing C; , and it then follows that 8 < 1. The same 
result would, of course, apply when log A; is negative. 

Now if the situation were such that as C; increases various preventive 
measures are taken which significantly reduce the infectivity factor, 


4 

| 

7 


STATISTICAL APPROXIMATION TO INFECTION INTERVAL 249 


further damping will take place, and b will become smaller. To take 
this into account it would then be necessary to adjust upward the value 
of b, resulting in a corresponding increase in the length of the infection 
interval for each of the assumed values of n. 

(ii). Equation (4) above implies that the entire population is poten- 
tially susceptible, and that the only immunes present at any given time, 
are those individuals who have gained immunity for a short period by 
virtue of recent infection. If, however, only a fraction, g, of the entire 
population are potentially susceptible, then instead of (4), it would be 
necessary to write 


and substituting this, instead of (4) in (5), and in (7) and (8), it will be 
seen that for the same observed value of b, a somewhat longer infection 
interval would be estimated, depending on the degree of departure of 
q from unity. 

(iii). Equations (4) and (11) will progressively lose accuracy as n 
gets very large. Thus, if the period of immunity were to last several 
months, these expressions would require modification to take account of 
variation in C;_, , 

References (3) and (4) listed below, and others listed in these refer- 
ences, discuss various aspects of the law of mass action of importance 
in connection with epidemic theory. 


REFERENCES 


1. Soper, H. E. The Interpretation of Periodicity in Disease Prevalence. Jour. Roy. 
Statist. Soc. 92, 34-61, 1929. 

2. Chassan, J. B. The Autocorrelation Approach to the Analysis of the Incidence of 
Communicable Diseases. Human Biology 20, 2, 90-108, 1948. 

3. Wilson, E. B. and Worcester, Jane. The Law of Mass Action in Epidemiology. 
Proc. Nat'l Acad. Science 31, 1, 24-34, 1945. 

4. Wilson, E. B. and Burke, Mary H. The Epidemic Curve. Proc. Nat'l Acad. 
Science 28, 9, 361-367, 1942. 


S 1000: C 11) 4 
q — nC; ( 
= 
‘ 
| 


QUERY: [I am carrying forward research on little known or on 
70 unknown tropical feedstuffs. For this research, rats, baby chicks 
and pigs are being employed. The unknown feedstuffs are evalu- 


ated singly and in combinations. I would appreciate your opinion on 


QUERIES 


the proper method of statistical analysis for our data. 


As an example and for brevity, here are some actual data from a pilot 


trial, together with the analysis of variance. 


WEIGHT GAINS OF BABY CHICKS 


Treatment 
No. chicks 1 2 3 4 Entire sample 
1 55 61 42 169 
‘° 2 49 112 97 137 
3 42 30 81 169 
4 21 89 95 85 
5 52 63 92 154 
219 355 407 714 1695 
ANALYSIS OF VARIANCE 
Sources D.F. S.S. M.S. 

Lot means 3 26235 8745** 

Individual 16 11559 722 

Total 19 37794 


The F-test in the above case is highly significant indicating that we 
are not dealing with a single population. This method of analysis how- 
ever does not provide us with a means of stating that treatment No. 3 
is better than No. 1 or No. 2 is better than No. 4, ete. Could you pro- 
vide us with the most valid method with which we could make these 
comparisons? 


250 


i 

= 

| 

4 ‘| | | 

4 

4 

4 

4 


QUERIES 251 


Happily this perennial question has been provided with an 
ANSWER: answer by Dr. John W. Tukey in the June issue of this 

Journal (Vol. 5: pages 99-114, 1949). Tukey’s method 
indicates a gap between the first three treatments and the fourth. Ata 
risk of less than one per hundred, one would reject the hypothesis of no 
difference between treatments No. 3 and No. 4. 

There is not sufficient evidence to cut off the straggling mean of 
treatment 1 (P = 0.17). Finally, applying the F-test as indicated by 
Tukey, one does not reject the hypothesis that lots 1, 2, 3 are drawn from 
a common population (P = 0.1). 

I assume that your experiment was conducted so that environmental 
differences were randomly distributed over all the chicks in the experi- 
ment; otherwise, there is no unambiguous answer to the question about 
the effects of treatments. 


QUERY: In an experiment in which one half of the controls 
71 reacted positively and one half negatively, it would seem that 
chi-square should be the same whether one uses the formula, 


x” = — m)*/m, 
or the formula for the 2 X 2 table, 


2 _ (ad — be)(a+b+c+d) 
xX atc(b +d) 


But this is not the case. Why? 

For example, suppose 200 animals are divided equally among experi- 
mentals and controls. Then, according to the proposition under con- 
sideration, suppose 50 controls live and 50 die, and suppose 63 of the 
experimentals live and 37 die. Is the experimental procedure effective? 

By the 2 X 2 table, x” = 3.438, not significant. But by the other 
formula, comparing the experimentals with a 1 : 1 ratio, x’ = 6.760, 
highly significant. Why do not the two methods agree? 


You have described two different experiments leading 
ANSWER: quite properly to different values of chi-square. In the 

first experiment there are only 100 animals, all treated 
experimentally. The assumption is made that in the untreated popula- 
tion the ratio of the numbers living and dying is 1 : 1. The hypothesis 
being tested is that the same ratio applies to the treated population; that 
is, that the treatment is without effect. The value of chi-square, 6.760, 


ic 
i 
| 
aye 
if 
Fade 


252 BIOMETRICS, SEPTEMBER 1949 


would lead to rejection of the hypothesis with P approximately 0.01. 
In this experiment there are no controls because the experimenter sup- 
plies the information about how controls behave. 

The second experiment contains 200 animals, but half of them are 
used to get evidence about the behavior of the untreated population. 
Here the experimenter either has no knowledge of the behavior of the 
controls or is unwilling to rely on his knowledge. In this experiment, 
the hypothesis being tested is that the experimentals and controls have 
the same ratio, but the value of the ratio is not specified. The experi- 
menter supplies less information than he dtd in the first experiment. 
The result is that the same number of experimentals, divided in the 
same ratio, lead to less certainty about the conclusion. j 

Querist feels that the chance division of the controls in the | : 1 
ratio is equivalent to the 1 : 1 hypothesis which was set up in the first 
experiment. That this is not true may be clear if he considers the 95 
percent confidence interval based on a sample of 100 equally divided in 
outcome. This interval is from 40 percent to 60 percent. The corre- 
sponding 99 percent interval is from 37 percent to 63 percent. Evidently 
the information supplied by such a sample of controls is far less than that 
furnished by the experimenter in postulating the 1: 1 ratio for the 
population of controls. 


QUERY: Hace un tiempo, se discutia en una reunidn efectuada 
72 entre técnicos especialistas en mafz las exigencias para aprobar 
un hibrido o rechazarlo.— 

Alguien sugirié aceptarlos cuando los rendimientos eran estadistica- 
mente significativos.— 

Y aqui comenzé la controversia. Otro técnico tomé la palabra para 
exponer su pensamiento al respecto. Dijo, que si se efectuaba un ensayo 
con todo cuidado, las exigencias para considerar un determinado hibrido 
estadisticamente superior a otro (altamente significativo), serian muy 
reducidos. Por ejemplo, un 3% de diferencias en los resultados, podria 
ser lo suficiente para que de acuerdo al andlisis estadistico, se considere 
a un hibrido superior.— 

Esto llevaria a un error, pues un 3%, en la practica (en el gran cul- 
tivo) no tendria ninguna importancia, por lo que el procedimiento era 
erréneo. En cambio se mostré partidario de exigir un 10% de diferencia 
en los rendimientos y fijar un error standard de por ejemplo 6%.— 

Desde luego, no sé a ciencia cierta quien tiene razén, por lo que 
recurro a Ud. a fin de que me evactie la consulta. Puede hacerlo en 
inglés.— 


4 
4 
4 


QUERIES 253 


Yield trials of various crops are usually conducted for one 
ANSWER: of two reasons: (1) to provide a test for a particular 

hypothesis or (2) to provide information which can be 
used as a guide in making recommendations over a range of soil and 
climatic conditions. 

In the first instance an efficient experimental design and adequate 
replication are necessary so that the desired tests may be performed with 
the required precision. The number of replications and choice of design 
will, in part, be dictated by past experience as to soil variability, etc. 

In the more general case where yield trials are conducted to provide 
information which will serve as a guide in making general recommenda- 
tions, the situation is quite different. It is well established that different 
varieties respond differently in different years and at different locations. 
Therefore, varietal trials must be grown at several locations and in differ- 
ent years. Thus, there is little point in striving for “statistical signifi- 
cance”’ in each of the individual tests. An increase in number of replica- 
tions for any single test will have little effect in reducing the magnitude 
of the. variety X year or variety X location interaction. 

The general practice in yield trials is not to select one or a few of the 
apparently superior items, but rather to discard a group of the poorer 
items. The items remaining are then tested further to provide additional 
information on performance. 

If a number of varieties are tested over a series of years and locations, 
the outcome will almost certainly be a group of varieties which are so 
similar in yield and other characteristics that the differences among 
them will not be statistically significant. The best estimates of the rela- 
tive value of the varieties in this group will be the actual averages 
obtained. 

G. F. SPRAGUE 


‘| 
at 
Hea 


THE BIOMETRIC SOCIETY 


By the time this number of Biometrics reaches you, each member of 
the Society will have received his free copy of our first Directory. 
Additional copies have been printed to send to new members as they are 
enrolled. It is available to non-members for 50 cents. Until a new 
edition is warranted, we propose issuing an annual supplement. As you 
will have discovered, the Directory includes a list of officers, the consti- 
tution of the Society, the Council by-laws, and the statutes of each 
region as well as the alphabetical membership list and a geographical 
summary. The information provided for each member includes his 
professional connection as recorded in the Secretary’s office on June 15 
and his major field of interest. Later, we hope to summarize the dis- 
tribution of members among the different fields of interest. Although 
the Society has been in existence for less than two years, the geo- 
graphical breakdown shows that we had 888 members in 33 different 
countries when the Directory went to press. The first and largest 
organized region was the Eastern North American, with 478 members. 
The other regions in order of formation were the British with 111 
members, Western North American with 73 members, Australasian with 
37 members, Indian with 43 members and French with 47 members. 
In addition, there were 99 members-at-large. 

Since the last issue, the Council has approved the statutes of the 
Australasian, Indian and French Regions. These are already included 
in the Directory, so that they need not be reprinted here. 

Developments in France are of unusual interest. The biometricians 
there have adopted a dual organizational plan in accord with a law of 
1901 governing official French societies. They have formed the autono- 
mous Société Francaise de Biométrie. At the same time they have 
formed the Region Frangaise of the Biometric Society and provided that 
all full members of the Société Francaise de Biométrie shall be members 
of the Biometric Society. In view of this interesting development the 
tentative proposal of a joint French-Italian region has been abandoned. 


At the last meeting of the Société Frangaise, on May 17 at the Labora- 


toire de Zoologie de la Faculte des Sciences, Paris, the following com- 
munications were presented: “La rehabilitation de "homme moyen” by 


254 


| 
As 
j 
F 
| 
4 


THE BIOMETRIC SOCIETY 255 


M. Frechet, “Facteurs lateraux et facteurs sexuels dans la morphologie 
des empreintes digitales” by R. Turpin and M. P. Schutzenberger, and 
“Etudes biometriques sur le colibacilie” by J. Dufrenoy. 

Within the last months the following regional officers have been 
elected and confirmed by Council: British Region: Vice-President, 
J. W. Trevan; Secretary, D. J. Finney; Treasurer, K. Mather; Regional 
Committee, J. O. Irwin, J. I. M. Jones. Indian Region: Vice-President, 
P. C. Mahalanobis; Secretary, C. Radhakrishna Rao; Treasurer, Mohan- 
lal Ganguli; Regional Committee, V. M. Dandekar, K. Kishen, K. R. 
Nair, U. S. Nair, V. G. Panse, P. B. Patnaik, B. Ramamurthy, R. V. 
Sukhatme, V. D. Thawani. 

Since last November the Society has been provided with temporary 
headquarters in a pleasant room at 321 Congress Avenue in New Haven 
by the Department of Public Health of the Yale University Medical 
School. This room, however, will be required for new activities in the 
next academic year. Through the kindness of the Department of Ap- 
plied Physiology, the Society has had the good fortune of obtaining a 
larger room at 52 Hillhouse Avenue in the main part of the University, 
and moved there on July 5. We would be very glad to welcome any 
visiting members at our new headquarters. We are very sorry to lose 
the services of Mrs. Elizabeth Weinman, who was Executive Assistant 
to the Secretary through June 30. The Society has benefited greatly 
from her efficient handling of the many details of the Secretary’s office 
and wishes her well in her new undertaking. We have been fortunate 
in obtaining as her successor Mrs. Irving N. Fisher, who knows at first 
hand all of the countries where we have regions and most of the other 
countries where we have members. 


ae 
ae 
al 
ge 


NEWS AND NOTES 


At the Raleigh branch of the Institute of Statistics there is a small 
news publication called the “‘ Leaky Gasjet” which is printed irregularly 
depending upon the quantity of choice gossip acquired by its faithful 
seekers. The following excerpt was taken from the June, 1949, edition. 


Dear Gasjet Editor: 


I am a newly created Ph.D. in Experimental Statistics and I am 
worried because I expect to do consultation and I am afraid that the 
research workers will ask me questions that I won’t be able to answer. 
What shall I do? 


Phidler 
Dear Phidler: 


Here are a few simple devices which should prove useful to you in 
your consulting work. Relax, once you have mastered them you have 
absolutely nothing to worry about. 


Research Worker: Confidently. I have done an experiment, Mr. 
Phidler, in which I have two plants, one of each variety, in each pot 
and fifteen pots. Can you tell me how to analyze it so as to show 
that Variety A is taller than Variety B? I realize laughing selfcon- 
sciously that this is a very elementary question but .. . 

Phidler: Frowning. Naturally as a new Ph.D. this is far too difficult a 
question for him, but he ts not alarmed. Just what do you mean by 
taller? 


This illustrates both the Device of the Counterquestion and the Device 
of the Definition of Terms. 


Research Worker: A bit taken aback. Taller? Well I mean bigger— 
not not bigger— 

Phidler: Sternly. Come now, we cannot get anywhere unless we have 
specific, operational definitions. 

Research Worker: Yes, of course. What I meant was I measured the 
height of each plant and— 


| 
| 
le 
256 


NEWS AND NOTES 257 


Phidler: The external or the internal height? He pauses, but Research 
Worker is unable to answer. A similar problem came up in the Jour.- 
Roy-Stat-Soc-Supple-eleventy two-page 476. 


The Device of the Non-Evxistent Reference 


Research Worker: Awed. What was that reference again? 

Phidler: No matter. It’s by Gregory Hairshirt. I knew Hairshirt in 
kindergarten—an idiot—his papers were demolished by Smirkley 
Annals of Applied Human Genetics. Let’s get back to our little 
problem. 


The Device of Complete Familiarity with Everyone and Everything 
Research Worker: Relieved. Yes, Yes. Now I thought this design— 


Phidler: Design? Laughs Yes design. You realize of course that you 
should have used a cuboidal lattice in this experiment. 


The Device of the Wrong Design 
Research Worker: I—well I didn’t know— | 
Phidler: Aloud to the walls. How do these research workers expect us 
to get anything out of their data when they use any old design. Ah 


well, I suppose we can work it out by matrix methods. Tell me, 
what is the Cost Function for height in this problem? 


The Device of the Unnecessary Complication 


Research Worker: Cost? I don’t know—I thought this was a simple 
sob! problem—but after all I’m only a miserable research worker and 
not a statistician alas! 

Phidler. Benevolently. Now, now, don’t cry. I will help you. This is 
really a very simple problem. 


The Device of Reversing your Field 


Research Worker. On his knees. For you, perhaps, O Master. The 


research worker is now in the proper frame of mind for consultation. 
From here on in Phidler can do ANYTHING. 


AFRICA—Among our new members, Henri Marchand, Dakar, 
Sénégal, West Africa, writes, ‘My researches are purely theoretical in 
the field of mathematical genetics. As soon as my present studies on the 
part that a single body can have on the evolution of a population are 
advanced, it will give me great pleasure to send you a report on the 
results at which I will have arrived.” 


43 

te 
pe 


258 BIOMETRICS, SEPTEMBER 1949 


AUSTRALIA—Helen Turner had plans all completed to attend the 
Second International Biometrics Conference in Geneva and to spend six 
months in Cambridge. Unfortunately family illness has intervened and 
the trip has been postponed. D.B. Duncan is busy developing a teaching 
program in Statistical Methods in the University of Sydney. Three new 
courses have been set up and the first graduate in Agricultural Science 
with Honors in Statistical Methods, J. A. Morris, took his degree this 
March and is now working in animal genetics in the Division of Animal 
Health and Production of C.S.I.R.O. H. O. Lancaster of the Common- 
wealth Health Department has just completed a year’s study in England 
and is now on his way back to Australia. E. A. Cornish has a new F, to 
carry on the statistical tradition. C. W. Emmens, author of the recently 
published Principle of Biological Assay, is coping well with a large de- 
mand for presentation of papers to scientific societies in Sydney. 


FINLAND—Leo Térnqvist, Chairman of the Institute of Statistics, 
University of Helsinki sends a brief note. He writes, “‘The Institute of 
Statistics in the University of Helsinki was founded in 1945, but has 
started its activity only in 1947. Its Chairman is the professor in statis- 
tics of the University, and an M.A. works there as assistant. The 
Institute is partly a statistical library, partly an advisory and direction 
office for the students of statistics. In addition the chairman, the assist- 
ant, and the more progressed students work with special statistical 
researches for outsiders. The received tasks have chiefly been from the 
branches of population—prognostics and analysis of economic time- 
series. The teaching in statistics belongs in the University under the 
Faculty of the Political Sciences. The student can choose statistics in 
the M.A.-examination for his chief subject or for one of his side subjects. 
After the M.A.-examination it is possible to go on with the studies as 
far as to the doctoral thesis. My special interests in statistics are the 
théoretical and economical problems.” 


INDIA—D. N. Nanda has taken up the position of a Statistician for 
Indian Army Ordnance Corps. He writes, “In this capacity I am to 
conduct Applied Research on the following subjects: (1) Design and 
Analysis of Experiments, (2) Quality Control, (3) Sampling Surveys 
(including inspection methods). There are a number of other topics on 
which I may have to work from time to time.””’ He would appreciate 
being informed of the latest developments in these fields. 


4 
| 
\- | 
= 
We 
ide 


NEWS AND NOTES 259 


UNITED STATES—On February 1, Alexander G. Ruthven, presi- 
dent of the University of Michigan, announced the establishment of the 
Institute for Social Research. ‘‘The institute will be directed by Rensis 
Likert and will provide a unified administration for two units already 
existing at the University, the Survey Research Center and the Research 
Center for Group Dynamics. Angus Campbell will succeed Mr. Likert 
as Director of the Survey Research Center, which will continue its 
major programs of research in such fields as: studies of economic behavior 
and motivation; studies in human relations and organization; studies of 
the American public’s understanding of major national and international 
issues; and the development of sampling survey methodology. Dorwin 
Cartwright will continue as Director of the Research Center for Group 
Dynamics. As a part of the Institute for Social Research this group will 
continue its program of research on the factors influencing productive 
and harmonious group functioning. It will continue its studies on human 
relations in industry, leadership, communication within groups, inter- 
group relations, and the social satisfaction of community life. As a 
result of the joining of the two centers, the Institute is better able to 
bring to bear quantitative and experimental research methods on com- 
plex and important social problems. Research findings of the Institute 
are communicated not only through teaching and scientific publications, . 
but also through consultation and training in various organizations. 
The staff of the Institute includes over 350 persons engaged in full time 
or part time work. Approximately 125 of this number are located in 
Ann Arbor. Although most of the professional staff are social psycholo- 
gists, various other social sciences are represented.’’ Melville A. Taff, 
Jr., formerly with the Louisiana State Department of Health, New Or- 
leans, is now with the Territory of Hawaii Department of Health, 
Honolulu, as Chief of the Bureau of Health Statistics. Mr. Taff writes, 
“The Bureau is being expanded to provide statistical service for the 
entire department. Additional tabulating equipment has been ordered 
and more statistical personnel will be added as necessary. A central 
statistical service unit is the prime objective. An Act patterned after 
the Uniform Vital Statistics Act was passed at the 1949 session of the 
Legislature and now awaits the signature of the Governor. Once signed 
one of the first moves will be to consolidate small and sparsely populated 
registration districts and wherever possible and practical to appoint 
the local health officer as local registrar. Office methodologies are being 
reviewed and revised procedures are being written.” Paul T. Bruyere 
formerly with the Army Institute of Pathology is now with the Division 
of Tuberculosis, United States Public Health Service. He, with Martha 


14 
| 
Ap 
og 
“4 
a 


260 BIOMETRICS, SEPTEMBER 1949 


Bruyere, is making a study of the early development of tuberculosis 
among student nurses. Jack Chassan also joined the United States 
Public Health Service and is working with the Bruyere’s on the student 
nurse study. He recently left the Office of the Surgeon General, Depart- 
ment of the Army. Allen B. Burdick is now Assistant Professor of 
Agronomy, University of Arkansas, Fayetteville. He is initiating re- 
search in the development of grain and forage types of sorghum and will 
teach a course in the genetics of plant breeding. His theoretical research 
will continue to emphasize the mathematical aspects of quantitative 
inheritance. Mr. Burdick was with the Atomic Energy Commission 
at the Genetics Division, University of California, Berkeley. H. M. C. 
Luykx is resigning his position as Associate Professor of Preventive 
Medicine, at New York University College of Medicine, to accept 
appointment as Biometrician for the Atomic Bomb Casualty Commission 
in Japan. The Commission operates under the Committee on Atomic 
Casualties of the National Research Council, Washington, by directive 
of the President, and is sponsored by the Atomic Energy Commission. 
Mr. Luykx will be stationed in Japan for about two years, where he 
will make his home in Kure, with frequent visits to Hiroshima and Naga- 
saki. R. L. Murphree recently resigned his position with the Bureau of 
Dairy Industry at Jeanerette, Louisiana, to accept a position as Associate 
Professor of Animal Husbandry at the University of Tennessee. Ken- 
neth S. Cole, formerly with the Institute of Radiology and Biophysics 
of the University of Chicago, is now Scientific Director of the Naval 
Medical Research Institute at Bethesda, Maryland. Theodore A. 
Bancroft has joined the staff of the Iowa State College Statistical 
Laboratory as Associate Professor—July 1, 1949. Gobind Ram Seth, 
on a four months’ leave of absence from the Statistical Laboratory: at 
Ames, flew early in July to visit the statistical institutions in Sweden 
and England, before returning to Delhi, India, where he will be teaching. 
Oscar T. Kempthorne was married in Vancouver, British Columbia, 
Canada, on June 10, 1949, to Miss Valda M. Scales of Coogee, New 
South Wales, Australia. Professor and Mrs. Kempthorne will be at 
home at 127 Stanton, Ames, Iowa, sometime in July. 


| 
| 
. 
: 
i 


| 
1 
| 
| 


